[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

vdelta questions

From: Greg Hudson <ghudson_at_mit.edu>
Date: 2000-08-09 20:27:29 CEST

First: what's up with our mailing list archives? They seem to stop at
June 15.

I've been wrapping my head around the vdelta algorithm as described in
"Delta Algorithms: An Empirical Analysis". I've done a very basic
implementation of the core block-move generation algorithm and have
run into an ambiguity. Since at least one other developer stated an
intention to work on the binary diff task, I'm hoping someone else has
read up on the algorithm and might have an interpretation.

vdelta maintains a hash table of indexes keyed by the four bytes
starting at that index. Do we think this hash table is supposed to be
chained, so that it preserves all indexes which have been entered, or
do you just replace an entry in the hash table if there was already
something there? (Either because of a hash collision or because
you're entering an index with the same four-byte sequence as a
previous entry.) The description of how comparisons are done does not
mention trying multiple possible matches and seeing which one yields
the longest match, but the last diagram in the example shows a "*" at
position 0 and 13 (both of which have "bcde" as the next four bytes),
and the text following the example says, "Note that the above matching
algorithm will actually find the longest match if indexes are kept for
every location in the string," which would only be true if multiple
match candidates were tried at each comparison.

This is purely an issue of delta performance. vdelta is supposed to
generate diffs about as small as bdiff's and run much faster, and I'd
like to have a good guess as to whether that level performance (both
in terms of runtime and diff size) can be obtained with or without
trying multiple matches. Obviously, it's much easier to implement the
algorithm if you simply store the most recent index at each hash table
location; the hash table becomes a trivial data structure and you only
have to check one match candidate at each byte position.

(One option is to go out and buy the _Practical Unix Software_ book,
which allegedly discusses vdelta according to the vcdiff draft, and
might contain an implementation I could refer to... although that
might have bad intellectual property consequences. Does anyone have
this book, and can tell me how much vdelta stuff is in there? Another
option is to ask Dave Korn, but I'd rather not take up his time before
exhausting the other options.)

Thanks.
Received on Sat Oct 21 14:36:06 2006

This is an archived mail posted to the Subversion Dev mailing list.