> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 16 May 2002 00:23
> Garrett Rooney <rooneg@electricjellyfish.net> writes:
>
> > seriously, i think this is a hell of a good reason to start putting
> > more work into sander's internal diff library. being dependent on
> > external tools like this is an inherantly problematic arrangement.
>
> The Collabnet folks (me, kfogel, gstein, cmpilato) don't really have
> time to spend on making Sander's code work, but I would be *thrilled*
> if someone else got it up to speed... preferably sooner rather than
> later, because Alpha is approaching soon, and it would be nice to have
> a long testing period. The API is fully documented in svn_diff.h.
>
> Here's the todo list:
>
> 1. Sander already wrote an output-vtable that produces unified diff
> between two sources. Someone needs to write an output-vtable to
> produce a 'merged' file with conflict markers when doing a 3 way
> diff. Sander and I have already discussed how to do it; the
> interface is clear. It's just a matter of someone spending a day
> or two writing it.
The merge table can be written in a few hours. That's not really a
big problem.
> 2. Need to wrap svn_io_run_diff[3] functions around the new
> svn_diff.h API. Pretty easy.
*nod*
> 3. Need one hell of a test suite for the library.
>
> Number 3 is the Big Problem. Even Sander himself admits that there
> are bugs in RAM consumption in his algorithms. They need to be
> optimized and tested to *death*.
I have some (a _lot_) of local changes because the time spent in
the code was unacceptable. Philip pointed out to me that running
diff with the --minimal option brings the times of diff more into
the range for comparison.
My local changes consist of the following:
- reimplementation of the LCS algorithm, doing BV-HS instead of
HS. This algorithm is described in "Speeding-up Hirschberg and
Hunt-Szymanski LCS Algorithms" by Crochemore, Iliopoulus and Pinzon.
- use of a red-black tree for token/position storage.
- removal of the svn_diff__hat support functions.
- reimplementation of diff_file.c such that the file is read into
memory and lines are compared in a more direct fashion. md5 was
overkill and way too time consuming.
- breakage of diff3... :( [needs some work to get it in operable
state again]
> Maybe someone will get inspired here... I hate having external
> dependencies, especially on Win32.
I will have time somewhere next week to continue work on it. One
of the things I want to try is to implement the algorithm described
in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
Miller.
Sander
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 16 00:56:52 2002