On Fri, Jun 27, 2003 at 11:02:44AM -0500, kfogel@collab.net wrote:
> mark benedetto king <mbk@lowlatency.com> writes:
> > I had to do some rcs file mangling recently, with thousands of revisions
> > of very large files.  Using co directly was unacceptable because of the
> > O(N^2) nature.
> > 
> > Instead, I used a slightly modified rcsparse to extract not only the change
> > metadata, but the deltas themselves, and the fulltext of the HEAD.
> > 
> > I took HEAD and picked up the deltas in reverse order, reconstructing
> > all of the fulltexts in N passes (there were no branches in these
> > rcs files).
> > 
> > This gave me a tremendous speedup, but wouldn't it also allow us to
> > remove the requirement for "co"?
> 
> Yeah -- Greg (Stein) and I were recently talking about doing just
> this, in fact.  I assume this technique caused massive disk usage,
> since you had to keep all those fulltexts around in order to avoid the
> N^2 behavior?
Yes, that's true.  Lucky for me, I was able to deal with one ,v at a time.
Since cvs2svn wants all fulltexts for all ,v files for a particular txn
(I presume) that would be quite a bit of data; essentially every fulltext
of every file all at once.
Aha!  We could *invert* the delta as we go.  I.e., start with HEAD, work
back to rev-1, dropping forward-deltas along the way.  Neat, though
it does trade the CPU and IO for the space savings.
Attached is my quick-hack-in-perl RCS delta applyer.  It may not be
completely correct, but it (seems to have) worked for all of my data.
I'm sure it would port pretty quickly to python.
--ben
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 27 19:19:36 2003