[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

How to generate text deltas for cvs2svn using the Python bindings?

From: Michael Haggerty <mhagger_at_alum.mit.edu>
Date: 2007-03-31 20:37:57 CEST


We're working on some new features to speed up cvs2svn conversions, and
it would be very helpful to be able to generate text deltas from
cvs2svn. So...

Short question (more background below):

Suppose I have two versions of a file (for example, in Python strings or
file-like objects) and I want to generate the text delta between them in
a form that I can stick in an svndump file. Can I do this using the
Python SVN bindings?

Example code would be very welcome.

Also, I would like to know what versions of the python bindings this
procedure works with. (Ideally we would like to be pretty flexible
about the version required, as people sometimes do their repository
conversion on the biggest computer they can find, and there is no
telling what is installed there.)

More background:

There are two very significant improvements that could be made to
cvs2svn if we could generate text deltas.

1. The output of cvs2svn is either a complete dumpfile, or dumpfile
fragments that are loaded immediately via svnadmin. Currently, cvs2svn
can only generate full-text dumpfiles. These are much bigger than
necessary, requiring far more temporary conversion space than they
should. Generating delta dumpfiles would reduce the waste.

2. The slowest part of cvs2svn is invoking rcs or cvs zillions of times
to extract the text of each revision out of CVS. But Oswald Buddenhagen
has been working on code to allow cvs2svn to extract the revision text
while it is parsing the CVS files. Currently the code stores CVS diffs
in a temporary database, and recreates the fulltext as needed by
combining the fulltext from the previous version with the diff from the
desired version. The problem is that this algorithm requires the
fulltext of every "live" file on every "live" branch. ("Live" here
means every file that has been seen once but still has pending revisions
that have yet to be converted.) What we would like to do is convert the
CVS diffs to SVN text deltas right at the beginning, so that we don't
need the revision fulltext while generating the output dumpfile.

Some of the benefit of text deltas would be lost if we had to fork
another program to create the deltas. That is why we would like to
generate them via the Python bindings if possible.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Mar 31 20:41:13 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.