[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Repository Size Growing VERY Quickly

From: Jared Hardy <jaredhardy_at_gmail.com>
Date: 2006-07-20 11:02:32 CEST

J Kramer wrote:

> As I understand it, SVN does do binary diffs in order to
> reduce the data storage, but is it really efficient at that?
>
My company is running an SVN repository that is majority binary files.
Even at 10K+ revisions, and 20K+ files, the whole Repository store is
about the same size as the Working Copy (both about 18 GB total). Most
of the files are fairly raw art files, or small binary exports, with
small incremental changes committed after creation time. Vdelta
definitely can't deal with compressed, especially lossy compressed,
files very well. I recommend keeping only smallish, raw source binary
files. Any files that can be built whole have no direct need for being
versioned, though we do version some "interim build" type objects if
they have a strong dependency with current project state. Definitely
save any compression steps in your build process until after versioning.
We also version all internally developed build tool executables, since
our project has a strong build-version to art-data dependency.

> Are there any tips that you guys have for reducing the size of the
> repository?

I had similar questions before I converted our project to use
Subversion. I finally gave up on them, because there simply isn't
another SCM system out there that handles versioning our binary files so
well, with so little network and storage overhead. The closest at the
time was Avid AlienBrain, which had a nice "bucketing" system for
performing the features you describe, and lots of nice Artist-centric
interfaces. Unfortunately, the rest of the repository model was
bonehead-stupid, slow, disk-hungry, and the licensing costs were
ridiculous. Perforce was evaluated for a long time, but it simply
doesn't handle binary file versioning well at all (it just numbers them,
gzips them, and puts them in a big directory), and requires a lot more
maintenance to keep the database metadata storage fast. Also, Perforce
had far too little tolerance for our frequent-offline work style. CVS
also had obvious major issues dealing with binary files properly.
    I figure it's easier to buy more storage now, than to worry about
it. There are tricks you can do with branches and dumpfilters. I've even
considered periodically dumping the whole repo, filtering all but the
most recent revisions into one new repo, filtering the rest into an
"archive" repo, and punting the archive repo off to a cheaper (slower
ATA disk) NAS storage volume, and keeping it around as a separate
read-only repo. It just turns out that I gave it plenty of disk space up
front -- way more than we needed -- so I haven't needed to do any of
that yet.

    It's a matter of costs, really. Which is more costly to you:
frequent maintenance time to only keep needed binary versions, periods
of down time to partition off the repo history, or just buying bigger
disks once in a great while, and moving the repo to those larger volumes
when needed? To me, disk is cheap, and just keeps getting bigger and
cheaper by the day. Definitely cheaper than my time.

:) Jared

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Jul 20 11:04:03 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.