On Fri, Jun 13, 2003 at 01:08:14PM -0500, kfogel@collab.net wrote:
>
> Some questions:
>
> - Have you tested the driver on any really big repositories, like
> the FreeBSD CVS repository (2.3 gigs)? Also, that one's good
> because it has a lot of edge cases -- twice-deleted files,
> branches where some files are branched much later than others
> ("split" branches), etc.
Perforce is testing this internally for cvs->p4 situations. We have
gotten through the XFree86 tree, which contains some odd situations as
well (like two branch tags applied to the same magic version number).
> - Is it holding a lot of state in memory, such as all the branch
> paths and things like that?
VCP holds all of the "state" in SDBM files *except* the list of
revisions to transfer. That's in RAM now and is slated to go to disk
very soon (it's preventing full FreeBSD cvs->p4 testing due to excessive
RAM utilization).
This will also allow "scan once, then convert, test, edit, convert,
test, etc" cycling.
> > as i saw in the profiling from vcp log, svn commit takes some time, the
> > longest is 20 sec or so for one large commit. but the bottle neck right
> > now is how it extracts every revision from cvs: doing cvs checkout -r
> > <revision> <onefile> for every file. i'll be implementing fast retrieval
> > of cvs by setting date tag and verifying the resulting revision, hopefully
> > this would boost conversion time. but more importantly is that the
> > conversion is incremental, so even if the very first conversion of a
> > large repository is slow, subsequent conversion of newly committed
> > files won't take long.
>
> Well, the total conversion time is still important -- many sites will
> be converting once and then using just Subversion. For them, the main
> issue is "How long will my developers be shut out of the repository
> during this conversion?"
VCP can run against a live repo once (slowly) and then be used to grab
changes made while it was running the first time. So the lockout period
is the duration of the second (fast) conversion.
That said, VCP::Source::cvs must get much faster.
Likewise, we found with VCP::Dest::p4 that direct API access (please
tell me someone's working on a Perl SubVersion::LibSVN or some such) is
much faster for a destination driver, even one that is optimized to
spawn as few child process as possible.
> If 2000 revisions was 7 hours, then (say) the main GNU toolchain
> repository would probably need a conversion time of several days.
Ew.
> (I'm not sure how using date instead of revision will help your CVS
> retrieval time? I would think the Subversion commits are a huge
> bottleneck... outputting to a dumpfile and then loading it might save
> a lot of time.)
This approach was also considered for VCP::Dest::p4, it is probably the
fastest way to go, but would require maintaining backcompatability as
svn evolves. Why does svn take overly long to commit (an operation
which should be lightening fast), and when is it likely to get faster?
- Barrie
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jun 17 15:43:02 2003