[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion and Very,Very Large Repositories

From: Roger Ashby <roger.ashby_at_gmail.com>
Date: 2006-02-28 17:29:46 CET

I forgot to CC: the list when I responded to several of the answers I
received so I'm including the conversations here:

On 2/28/06, Roger Ashby <roger.ashby@gmail.com> wrote:
> On 2/28/06, Ryan Schmidt <subversion-2006Q1@ryandesign.com> wrote:
>
> > The FSFS backend stores one file per revision. I don't know how the
> > BDB backend stores things. Most people recommend using FSFS, not BDB.
> > You should use the current version of Subversion, 1.3.0. To properly
> > support large files, you should use APR 1.2 (and, if you want Apache,
> > then Apache 2.2.x), not APR 0.9 (which came with Apache 2.0.x).
>
> I think SVK by default uses a BDB backend, so I will look into the
> FSFS option. Does the FSFS option not save one file per revision; I
> ask because usually we wouldn't have very large commits however we
> have had days where there has been a couple GB of new files added to
> the repository.
>

On 2/28/06, Ryan Schmidt <subversion-2006Q1@ryandesign.com> wrote:
>
> On Feb 28, 2006, at 15:56, Roger Ashby wrote:
>
> > On 2/28/06, Ryan Schmidt <subversion-2006Q1@ryandesign.com> wrote:
> >
> >> The FSFS backend stores one file per revision. I don't know how the
> >> BDB backend stores things. Most people recommend using FSFS, not BDB.
> >> You should use the current version of Subversion, 1.3.0. To properly
> >> support large files, you should use APR 1.2 (and, if you want Apache,
> >> then Apache 2.2.x), not APR 0.9 (which came with Apache 2.0.x).
> >
> > I think SVK by default uses a BDB backend, so I will look into the
> > FSFS option. Does the FSFS option not save one file per revision; I
> > ask because usually we wouldn't have very large commits however we
> > have had days where there has been a couple GB of new files added to
> > the repository.
>
> Yes, the FSFS backend stores one file per revision. So you should
> make sure that you use APR 1.2, which includes support for files >
> 2GB, and also make sure you use a filesystem that supports individual
> files > 2GB.
>
> There was also this bug which will be relevant to you:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2453
>
> As you'll see in the bug notes, it was fixed in APR and will be in
> the next version of APR 1.2 (whatever's released after 1.2.2) so
> you'll either have to wait for such a release or maybe build APR from
> a development snapshot.
>
>
>

>>>>>New Conversation Start

On 2/28/06, Roger Ashby <roger.ashby@gmail.com> wrote:
> On 2/28/06, Thompson, Graeme (AELE)
> <Graeme.Thompson@smiths-aerospace.com> wrote:
>
> > Is there any logical way that you can split the images into categories
> > and then store these in multiple repositories.
> >
> > This is not to do with limitations of subversion but to do with having
> > to find a single server hat can store that amount of data! If you split
> > it into multiple repositories you could be able to buy a new server and
> > migrate some of your repositories to the new server to spread the load.
>
> Disk storage actually isn't a issue, the sun server where the
> repository lives is connected to our local SAN and we have have few TB
> of space available on the Disk Arrays connected to the SAN.
>
> Am I right in my understanding that subversion only saves the
> delta of data in each of the revisions so If on any given day the only
> change to the 700 GB repository is the addition of a set of files,
> would only the added files be included in the revision, and would the
> revision be saved in one file when using FSFS? Speaking to the second
> question, if that is the case I may have issues because we do
> sometimes add a couple of GB of data in one day (I plan to commit the
> changes to the repository at the end of each day).
>

On 2/28/06, Thompson, Graeme (AELE)
<Graeme.Thompson@smiths-aerospace.com> wrote:
>
> > Disk storage actually isn't a issue, the sun server where the
> > repository lives is connected to our local SAN and we have few TB
> > of space available on the Disk Arrays connected to the SAN.
> >
> Impressive! - And all in a single volume!
>
> > Am I right in my understanding that subversion only saves the
> > delta of data in each of the revisions so If on any given day the only
> > change to the 700 GB repository is the addition of a set of files,
> > would only the added files be included in the revision, and would the
> > revision be saved in one file when using FSFS?
>
> No - see the other posters. It will be one file per commit.
>
> > Speaking to the second
> > question, if that is the case I may have issues because we do
> > sometimes add a couple of GB of data in one day (I plan to commit the
> > changes to the repository at the end of each day).
> >
>
> It *should* only store the binary difference between the files, with
> certain files this is easy, e.g. uncompressed files, txt files bitmaps
> etc. But if you are dealing with compressed files, i.e. jpg and other
> formats then this binary difing can end up with quite large diffs.
>
> --Graeme
>
> ******************************************
> The information contained in, or attached to, this e-mail, may contain confidential information and is intended solely for the use of the individual or entity to whom they are addressed and may be subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email.
> ******************************************
>

>>>>>New Conversation Start

On 2/28/06, Roger Ashby <roger.ashby@gmail.com> wrote:
> On 2/28/06, Russ Brown <rbrown@ebuyer.com> wrote:
>
> > Make sure you're using fsfs. Unless one revision (most likely the first
> > one) is greater than your filesystem's max file size limit, you will be
> > fine.
>
> What do you suggest I do if I use FSFS, and one revision (defiantly
> the first one) is more the my FS max. We will also occasionally add a
> 1-2GB files in one commit (I plan to commit all of the changes made
> daily).
>

On 2/28/06, Roger Ashby <roger.ashby@gmail.com> wrote:
> On 2/28/06, Russ Brown <rbrown@ebuyer.com> wrote:
>
> > Make sure you're using fsfs. Unless one revision (most likely the first
> > one) is greater than your filesystem's max file size limit, you will be
> > fine.
>
> What do you suggest I do if I use FSFS, and one revision (defiantly
> the first one) is more the my FS max. We will also occasionally add a
> 1-2GB files in one commit (I plan to commit all of the changes made
> daily).
>

On 2/28/06, Russ Brown <pickscrape@gmail.com> wrote:
> On Tue, 2006-02-28 at 10:19 -0500, Roger Ashby wrote:
> > On 2/28/06, Russ Brown <rbrown@ebuyer.com> wrote:
> >
> > > Make sure you're using fsfs. Unless one revision (most likely the first
> > > one) is greater than your filesystem's max file size limit, you will be
> > > fine.
> >
> > What do you suggest I do if I use FSFS, and one revision (defiantly
> > the first one) is more the my FS max. We will also occasionally add a
> > 1-2GB files in one commit (I plan to commit all of the changes made
> > daily).
>
> (apologies for the double-post earlier if you got two: the first email
> address isn't the one subscribed to the list).
>
> I think someone mentioned apr versions as being one potential blocker
> for you in terms of the maximum size of one commit, though I'm not sure
> whether or not that would affect communication over the svn protocol.
>
> In terms of kicking off the repository, one possibility is to create the
> repository and then add and commit files to it incrementally, rather
> than adding them all in one go. Also remember that changes to files are
> sent as diffs, so although you're changing a file that is (say) 70MB,
> the diff might only be 5.
>
> Also, it's generally better to commit based on logical changes than to
> commit based on time elapsed. though I concede the latter may make more
> sense in your situation.
>
> --
>
> Russ
>
>
Received on Tue Feb 28 19:03:11 2006

This is an archived mail posted to the Subversion Users mailing list.