Philip Martin <philip@codematters.co.uk> writes:
Eek!  I don't know what went wrong, but the body of my last mail
appears to have gone walkies :-(  Here is what I meant to say...
"Sander Striker" <striker@apache.org> writes:
> I will have time somewhere next week to continue work on it.  One
> of the things I want to try is to implement the algorithm described
> in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
> Miller.
I have some old C++ code that implements O(NP). I was porting this to
Subversion when Sander's code appeared. He and I discussed our
approaches and I decided to stop work on my code.  Sander contacted me
yesterday, asking about the performance of O(NP). This prompted me to
look out my old code and test it against GNU diff.  At first my code
used a vtable function based approach to tokens, a bit like the one in
svn_diff.h.  However that proved to be a performance killer, the
algorithm has a tight loop comparing tokens and the function call
overhead is significant.  Yesterday I stripped out the token
interface, moving to an array interface (which is also more like GNU
diff) and added some simple line hashing. It now runs almost as fast
as GNU diff --minimal, its about 5% slower.
While the token/vtable interface in svn_diff.h provides a generic diff
capability, I think it may be a performance limitation. My original
C++ code was template based, the "function" interface was non-virtual
and could be inlined. It will be interesting to see if the algorithm
used by Sander's code is similarly affected by this interface.
It may be that Subversion needs different algorithms depending on the
size of the file. A minimal match algorithm for small/medium files to
reduce the number of conflicts, and a non-minimal algorithm for large
files to get better speed/memory performance.
Anyway here's my code(*). It contains the "library" code implementing
svn_diff/svn_merge, and a command line test harness.  It's certainly
not fully tested, but here's the performance on some "hard" files,
ones with 10000 lines chosen from a set of 500 different lines, i.e.,
lots of matching lines.
% time diff --minimal foo bar > /dev/null
real    0m1.894s
user    0m1.890s
sys     0m0.010s
% time ./svn-diff foo bar > /dev/null 
real    0m1.971s
user    0m1.970s
sys     0m0.020s
-- 
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 16 14:25:16 2002