Re: Antwort: Re: cvs2svn.py fails converting repository with the attached file

From: Greg Stein <gstein_at_lyra.org>
Date: 2003-06-27 22:49:12 CEST

On Fri, Jun 27, 2003 at 01:43:26PM -0400, Daniel Berlin wrote:
> On Friday, June 27, 2003, at 12:41 PM, mark benedetto king wrote:
> >On Fri, Jun 27, 2003 at 10:14:16AM -0500, kfogel@collab.net wrote:
> >>which I'm pretty sure does not support the extended RCS file
> >>definition.

rcsparse does not support the "extended" RCS definition. I prefer standards
over "extensions". My standard knee-jerk is to say "screw the CVSNT guys. go
get your tools from them if they're going to monkey the format." But the
more reasonable side of me says to temper that :-)

> >>Maybe it should, but then we'd probably also have to get
> >>an improved version of RCS 'co', or find some other means to obtain
> >>specific revisions from a ,v file...

We've already got Python code to fetch fulltexts. But...

>...
> The reason we use it is because it was *much* faster than doing the
> delta application in python the inefficient way (applying deltas one by
> one), which is what cvs2svn did originally.

Right. I had that in there, and Dan changed it over to 'co' :-) (about a
year ago, when he provided the monster patch to take cvs2svn the last mile
to building a repository)

>...
> >Instead, I used a slightly modified rcsparse to extract not only the
> >change
> >metadata, but the deltas themselves, and the fulltext of the HEAD.
> >
> >I took HEAD and picked up the deltas in reverse order, reconstructing
> >all of the fulltexts in N passes (there were no branches in these
> >rcs files).

I'll note that one of Daniel's patches did this -- caching the fulltexts as
they were built. But I didn't fold in that part. Eventually, the switch to
'co' was made, and we just stopped worrying about the caching.

> >This gave me a tremendous speedup, but wouldn't it also allow us to
> >remove the requirement for "co"?
>
> Only if it's faster than co.

Right. I know from tests that 'co' is *very* much faster than any algorithm
based around rcsparse. Even with the 'tparse' parser plugged under rcsparse,
we can't assemble the fulltexts as fast as co.

That said, if you've got (say) 10,000 revisions and you'll need to assemble
all of them, then assembly/caching *could* be faster than running 'co' that
many times.

The question is: where is the tipping point? what nature of ,v file
establishes that point? e.g. large changes? random changes? large file? how
many?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Fri Jun 27 22:44:34 2003

This message: [ Message body ]
Next message: Daniel Berlin: "Re: Antwort: Re: cvs2svn.py fails converting repository with the attached file"
Previous message: Greg Stein: "Re: Upcoming release 0.25."
In reply to: Daniel Berlin: "Re: Antwort: Re: cvs2svn.py fails converting repository with the attached file"
Next in thread: Daniel Berlin: "Re: Antwort: Re: cvs2svn.py fails converting repository with the attached file"
Reply: Daniel Berlin: "Re: Antwort: Re: cvs2svn.py fails converting repository with the attached file"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]