[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: cvs vs svn repository size

From: Mark Phippard <markphip_at_gmail.com>
Date: Tue, 12 Aug 2008 16:43:00 -0400

On Tue, Aug 12, 2008 at 4:36 PM, Miller, Eric <Eric.Miller_at_amd.com> wrote:
>> From: Paul Koning [mailto:Paul_Koning_at_Dell.com]
>> For more realistic change sizes, especially those that touch multiple
>> files, and for more plausible repository sizes, I don't think there's
>> an issue.
> The second example is fairly typical of the repositories I would be
> converting. Others I have converted are around the 3x mark. Only one
> case was the resulting repository smaller than cvs and it was a trivial
> case.
> These are certainly not contrived examples.
> A cursory inspection shows that near the end of the conversion (r15000)
> average rev size is 56k with less than 1% of the file size attributed to
> the delta itself. It is not a file system cluster size issue.

When I sent my email, I realized you had two examples. Your first
example seemed to point to the file size issue. Your #revisions * 8k
equaled your size almost exactly.

>> From: Mark Phippard [mailto:markphip_at_gmail.com]
>> If you use fsfs you can eat a lot of size just because there are a
>> couple of files created for every revision. So the smallest file size
>> comes in to play. Obviously if the size of the commit is bigger, this
>> is less of an issue, but with a 1-byte change you are still creating
>> multiple new files that each will be what? 8K 16K 32K?
> 55k as mentioned above. However it does not appear to be the "multiple
> files / revision" issue but rather the entry list that accumulates for
> every revision file. The db/revs dir in the cvs2svn run is 555M.

I've seen this mentioned before, I might have the details wrong. Your
analysis seems right though. It is something like if you have a lot
of files in a folder, or maybe folders in a folder, then the size of
the file can be large just from this metadata.

>> BDB is more efficient in this case.
> Correct me if I am wrong, but I am under the impression that BDB is more
> susceptible to database corruption and has performance issues. Would
> you describe that characterization as accurate? I'll run a conversion
> with bdb for completeness.

I would not characterize either of those as true. BDB is susceptible
to "wedging", which requires you run recovery, but that is not
corruption. The "wedge" is something BDB uses to avoid corruption.
Also, if you use a recent BDB, then it can auto-recover most problems
anyway, meaning you do not have to do anything.

There are cases where each performs better/worse than the other.

Also, I think the general consensus is that fsfs produces smaller
repositories. It really depends on the data though. Branches and the
activity on the branch can make a big difference.

Mark Phippard
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-08-12 22:43:22 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.