Re: Help with very large repositories

From: Michael Muller <mmuller_at_enduden.com>
Date: 2005-08-27 20:51:19 CEST

Many thanks to those of you who responded, this is certainly a very responsive
mailing list!

All of what I'm hearing fits - the original developers chose to organize our
codebase into libraries with a huge number of source files. The 9200 file set
I described is a _single library_. The RCS files are all stored in the same
directory.

I don't think the code base has any branches or tags, so at least copying of
that directory should not be an issue. But I will try reorganizing the file
set into more of a tree and see if that makes any difference.

Thanks again!

Ben Collins-Sussman wrote:
>
> On Aug 27, 2005, at 1:09 PM, Michael Muller wrote:
>
> >
> > Ben Collins-Sussman wrote:
> >
> >> This is a cvs2svn scalability question. You need to resend this mail
> >> to the cvs2svn users@ list, and also say what version of cvs2svn
> >> you're using.
> >>
> >
> > No, I'm not concerned about the scalability of cvs2svn. As I said,
> > if I
> > produce a dumpfile (something like "cvs2svn --dump-only --dumpfile
> > repo.dump
> > /usr/local/cvsroot") it only takes about half an hour. Loading it
> > with
> > "svnadmin load" appears to take longer and longer for each revision.
>
> Ah, sorry.
>
> Part of the problem is that cvs2svn generates a huge number of
> branches and tags -- especially tags. So you've got lots of
> revisions that keep creating entries under the /tags directory.
>
> On top of that, there's definitely a db-schema inefficiency in our
> BerkeleyDB repository implementation. We store a directory's
> children in a single lisp-like s-expression: (child1 child2
> child3 ...). When you add a new child to a directory in one commit,
> we have to write out the entire expression again. If N is the number
> of children in a single directory to begin with, then it takes O(N)
> time to add a new child in the one commit.
>
> The FSFS repository implementation still has a similar O(N) problem,
> though not quite as bad. (One of our developers, ghudson, speculates
> it's only different from the BerkeleyDB implementation by a constant
> factor.)
>
> So, what you're experiencing is a combination of two problems:
>
> 1. cvs2svn creating huge directories from your huge codebase and
> history,
>
> 2. a less-than-ideal db schema, one not best suited for
> directories with huge numbers of children.
>
>

=============================================================================
michaelMuller = mmuller_at_enduden.com | http://www.mindhog.net/~mmuller
-----------------------------------------------------------------------------
There is no way to find the best design except to try out as many designs as
possible and discard the failures. - Freeman Dyson
=============================================================================

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Sat Aug 27 20:58:30 2005

This message: [ Message body ]
Next message: Blair Zajac: "Re: Help with very large repositories"
Previous message: Ben Collins-Sussman: "Re: Help with very large repositories"
In reply to: Ben Collins-Sussman: "Re: Help with very large repositories"
Next in thread: Blair Zajac: "Re: Help with very large repositories"
Reply: Blair Zajac: "Re: Help with very large repositories"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]