[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn diff optimization to make blame faster?

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 20 Sep 2010 13:10:48 +0200

On Mon, Sep 20, 2010 at 11:52 AM, Branko Čibej <brane_at_xbc.nu> wrote:
>  On 15.09.2010 14:20, Johan Corveleyn wrote:
>> Some update on this: I have implemented this for svn_diff (excluding
>> the identical prefix and suffix of both files, and only then starting
>> to fill up the token tree and let the lcs-agorithm to its thing). It
>> makes a *huge* difference. On my bigfile.xml (1.5 Mb) with only one
>> line changed, the call to svn_diff_diff is ~10 times faster (15-20 ms
>> vs. 150-170 ms).
>
>
> Hmmm ... looks to me like test data tailored to the optimization. :)

Nope, that's real data from a real repository, with a normal kind of
change that happens here every day.

Of course this optimization is most effective if there are a lot of
common prefix/suffix lines. If there is a single change in the first
line, and a single change in the last one, this optimization will do
nothing but introduce a little bit of extra overhead. And it will
obviously make the most impact on large files (in fact it's just
relative to the ratio of the "number of common prefix/suffix lines" to
the "number of lines in between").

I'm sorry it takes me longer than expected to post a version of this
to the list, but I'm still having some problems with a couple of edge
conditions (I'm learning C as I go, and I'm struggling with a couple
of pointer calculations/comparisons). I plan to post something during
this week...

Cheers,

-- 
Johan
Received on 2010-09-20 13:11:43 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.