[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Problem invoking diff3.

From: Sander Striker <striker_at_apache.org>
Date: 2002-05-16 01:03:10 CEST

> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 16 May 2002 00:23

> Garrett Rooney <rooneg@electricjellyfish.net> writes:
>
> > seriously, i think this is a hell of a good reason to start putting
> > more work into sander's internal diff library. being dependent on
> > external tools like this is an inherantly problematic arrangement.
>
> The Collabnet folks (me, kfogel, gstein, cmpilato) don't really have
> time to spend on making Sander's code work, but I would be *thrilled*
> if someone else got it up to speed... preferably sooner rather than
> later, because Alpha is approaching soon, and it would be nice to have
> a long testing period. The API is fully documented in svn_diff.h.
>
> Here's the todo list:
>
> 1. Sander already wrote an output-vtable that produces unified diff
> between two sources. Someone needs to write an output-vtable to
> produce a 'merged' file with conflict markers when doing a 3 way
> diff. Sander and I have already discussed how to do it; the
> interface is clear. It's just a matter of someone spending a day
> or two writing it.

The merge table can be written in a few hours. That's not really a
big problem.
 
> 2. Need to wrap svn_io_run_diff[3] functions around the new
> svn_diff.h API. Pretty easy.

*nod*

> 3. Need one hell of a test suite for the library.
>
> Number 3 is the Big Problem. Even Sander himself admits that there
> are bugs in RAM consumption in his algorithms. They need to be
> optimized and tested to *death*.

I have some (a _lot_) of local changes because the time spent in
the code was unacceptable. Philip pointed out to me that running
diff with the --minimal option brings the times of diff more into
the range for comparison.

My local changes consist of the following:

 - reimplementation of the LCS algorithm, doing BV-HS instead of
   HS. This algorithm is described in "Speeding-up Hirschberg and
   Hunt-Szymanski LCS Algorithms" by Crochemore, Iliopoulus and Pinzon.

 - use of a red-black tree for token/position storage.

 - removal of the svn_diff__hat support functions.

 - reimplementation of diff_file.c such that the file is read into
   memory and lines are compared in a more direct fashion. md5 was
   overkill and way too time consuming.

 - breakage of diff3... :( [needs some work to get it in operable
   state again]

> Maybe someone will get inspired here... I hate having external
> dependencies, especially on Win32.

I will have time somewhere next week to continue work on it. One
of the things I want to try is to implement the algorithm described
in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
Miller.

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 16 00:56:52 2002

This is an archived mail posted to the Subversion Dev mailing list.