[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: cvs vs svn repository size

From: Miller, Eric <Eric.Miller_at_amd.com>
Date: Tue, 12 Aug 2008 13:36:54 -0700

> From: Paul Koning [mailto:Paul_Koning_at_Dell.com]
> For more realistic change sizes, especially those that touch multiple
> files, and for more plausible repository sizes, I don't think there's
> an issue.

The second example is fairly typical of the repositories I would be
converting. Others I have converted are around the 3x mark. Only one
case was the resulting repository smaller than cvs and it was a trivial
case.
These are certainly not contrived examples.

A cursory inspection shows that near the end of the conversion (r15000)
average rev size is 56k with less than 1% of the file size attributed to
the delta itself. It is not a file system cluster size issue.

> From: Mark Phippard [mailto:markphip_at_gmail.com]
> If you use fsfs you can eat a lot of size just because there are a
> couple of files created for every revision. So the smallest file size
> comes in to play. Obviously if the size of the commit is bigger, this
> is less of an issue, but with a 1-byte change you are still creating
> multiple new files that each will be what? 8K 16K 32K?

55k as mentioned above. However it does not appear to be the "multiple
files / revision" issue but rather the entry list that accumulates for
every revision file. The db/revs dir in the cvs2svn run is 555M.

> BDB is more efficient in this case.

Correct me if I am wrong, but I am under the impression that BDB is more
susceptible to database corruption and has performance issues. Would
you describe that characterization as accurate? I'll run a conversion
with bdb for completeness.

Thanks,
Eric

> -----Original Message-----
> From: Paul Koning [mailto:Paul_Koning_at_Dell.com]
> Sent: Tuesday, August 12, 2008 2:19 PM
> To: Miller, Eric
> Cc: users_at_subversion.tigris.org
> Subject: Re: cvs vs svn repository size
>
> >>>>> "Eric" == Eric Miller <Miller> writes:
>
> Eric> Sorry if this has come up before - I could not find a suitable
> Eric> answer online.
>
> Eric> I'm currently investigating converting some cvs repositories to
> Eric> subversion and have discovered that the svn repositories are
> Eric> taking up a lot more space than the cvs originals.
>
> Eric> I have run a couple of tests - . A script to do 500 commits of
> Eric> random line of text to a file (fsfs): CVS Repository: 764k SVN
> Eric> Repository: 4.3M
>
> Eric> . A conversion of one cvs "repository" using cvs2svn: (trunk
> Eric> only, fsfs, ~15,000 revisions) CVS Repository: 109M SVN
> Eric> Repository: 614M
>
> Eric> Why am I seeing such bloated repositories? Svn is using 5-6x
> Eric> the disk space when I expected to see just the opposite.
>
> But those are tiny samples.
>
> In fsfs, each commit is a single file, but a separate file. That will
> consume disk space according to what your file system has for
> allocation granularity. If the clustersize is 4k, a one byte change
> will take 4k.
>
> In CVS, changes are recorded inside the files so small changes take
> less space.
>
> For more realistic change sizes, especially those that touch multiple
> files, and for more plausible repository sizes, I don't think there's
> an issue.
>
> paul
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-08-12 22:37:32 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.