Re: [PATCH] Simplification/speed-up of libsvn_diff by eliminating idx

From: Fuhrmann Stefan (ETAS/ESA1) <Stefan.Fuhrmann_at_etas.com>
Date: Fri, 27 May 2011 10:47:57 +0000

Morten Kloster wrote:

> [[[
> Simpler/faster LCS algorithm in libsvn_diff by elimination of idx.
>
>* subversion/libsvn_diff/lcs.c
> (svn_diff__snake_t): Removed idx parameter (always 0)
> (svn_diff__lcs_t): Removed idx variable (always 0) , let d have either
> sign, and moved the origo of fp by d. New version no longer chooses
> the shorter file as the "original", and can thus give different LCS's
> depending on the order of the input files even when the files have
> different lengths. Speed is ~15% faster for core lcs algorithm.
>]]]

Hi Morten,

Thank you very much for the patch. From what I have seen,
it should be correct (haven't applied & tested it, yet).

> The old version of the LCS algorithm started at k=0 and ended at
> k=d, where d is the length difference of the two files. d was required
> to be positive. It is more efficient to start at k=(-)d and end at 0 (since
> ending the loops at 0 allows for faster comparisons), and this also
> makes it easy to let d have either sign, which makes the idx variable
> unnecessary, further increasing speed by eliminating a parameter
> to the frequently called function svn_diff__snake_t. The sign of d
> is handled in the initial values of the loops (which have minimal speed
> impact).

Looks good.

> The new version will more often give different (but equally good) results
> depending on the order of the input files.

How does that affect our test suite? Is any of its expected
outputs invalid now? Is the result of blame affected as well
(e.g. if a file shrunk and / or grew in later revisions)?

> I estimated the speed
> difference by diffing all of the first 100 revisions of merge.c in
> libsvn_client against each other (when diffing only against the previous
> revision, the core LCS algorithm usually takes a minor part of the total
> running time, so this is not suitable for gauging the LCS algorithm
> speed).

The speedup is plausible. Did you test it with 32 bits?

> /* k = -1 will be the first to be used to get previous
> * position information from, make sure it holds sane
> * data
> */
> - fp[-1].position[0] = sentinel_position[0].next;
> - fp[-1].position[1] = &sentinel_position[1];
> + fp[d - 1].position[0] = sentinel_position[0].next;
> + fp[d - 1].position[1] = &sentinel_position[1];

Comment and code do not match anymore. Could you
update your patch to fix that? Also, it would be very
nice, if you added more commentary than there is today,
especially the less obvious index handling (why fp[d] will
always be valid etc.)

As I said, I like your patch and will certainly apply it as
soon as above points have been addressed.

-- Stefan^2.
Received on 2011-05-27 12:54:11 CEST

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: [PATCH] Simplification/speed-up of libsvn_diff by eliminating idx"
Previous message: Daniel Shahaf: "Re: svn commit: r1104124 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/deprecated.c libsvn_subr/io.c"
Maybe in reply to: Morten Kloster: "[PATCH] Simplification/speed-up of libsvn_diff by eliminating idx"
Next in thread: Johan Corveleyn: "Re: [PATCH] Simplification/speed-up of libsvn_diff by eliminating idx"
Reply: Johan Corveleyn: "Re: [PATCH] Simplification/speed-up of libsvn_diff by eliminating idx"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]