On Fri, Jun 27, 2003 at 11:02:44AM -0500, kfogel@collab.net wrote:
> mark benedetto king <mbk@lowlatency.com> writes:
> > I had to do some rcs file mangling recently, with thousands of revisions
> > of very large files. Using co directly was unacceptable because of the
> > O(N^2) nature.
> >
> > Instead, I used a slightly modified rcsparse to extract not only the change
> > metadata, but the deltas themselves, and the fulltext of the HEAD.
> >
> > I took HEAD and picked up the deltas in reverse order, reconstructing
> > all of the fulltexts in N passes (there were no branches in these
> > rcs files).
> >
> > This gave me a tremendous speedup, but wouldn't it also allow us to
> > remove the requirement for "co"?
>
> Yeah -- Greg (Stein) and I were recently talking about doing just
> this, in fact. I assume this technique caused massive disk usage,
> since you had to keep all those fulltexts around in order to avoid the
> N^2 behavior?
Yes, that's true. Lucky for me, I was able to deal with one ,v at a time.
Since cvs2svn wants all fulltexts for all ,v files for a particular txn
(I presume) that would be quite a bit of data; essentially every fulltext
of every file all at once.
Aha! We could *invert* the delta as we go. I.e., start with HEAD, work
back to rev-1, dropping forward-deltas along the way. Neat, though
it does trade the CPU and IO for the space savings.
Attached is my quick-hack-in-perl RCS delta applyer. It may not be
completely correct, but it (seems to have) worked for all of my data.
I'm sure it would port pretty quickly to python.
--ben
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jun 27 19:19:36 2003