[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Speed-up of libsvn_diff using token counts

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 31 May 2011 02:53:47 +0200

On Mon, May 30, 2011 at 8:50 PM, Morten Kloster <morklo_at_gmail.com> wrote:
> Johan, any progress on reviewing the code on your part? Things are
> a bit simpler now with the idx patch in: Since that patch settled the
> file (re)order issue, this patch now produces fully identical results to

Thanks for the updated patch, it applies cleanly to HEAD of trunk indeed.

I've done some tests, and they look good. The result of my 2200-rev
blame is identical before or after this patch (the idx patch caused
some minor changes, but at first sight nothing drastic -- seemingly
unimportant lines, which were switched between revs they were
attributed to).

With my usual blame testcase, I use the -x-b option (ignore changes in
amount of whitespace). This helps the blame speed tremendously in this
case, because some of the revisions in the history have been
spaces->tabs and vice-versa. With this test, I could see the slight
overhead of the token counting of your patch (2%-3%), because there
are not that much "one-sided" lines. So it's slightly slower than

However, when running the 2200-rev blame without ignore-options, I can
see a *huge* improvement thanks to your patch. This test used to take
over an hour. Now it's finished in 1m30s, as fast as the test with
-x-b. For the revs that change spaces->tabs all lines are "one-sided".

For real work, I prefer to use the -x-b option (I'm not interested in
who changed whitespace), but it's interesting to see the power of this

Anyway, I'm currently running the entire test-suite on your patch. So
far no problem. But before committing it I'd still want to do two

- Take a closer look at measuring the overhead of the token counting.
Maybe you can also provide some numbers here? I think a good test for
measuring this in practice is:
  1. take a very large file
  2. change a line in the beginning and at the end
     (eliminates prefix/suffix scanning, making sure everything goes to LCS)
  3. diff those two

- Take a closer look at the code. I've skimmed through it, and it
looked good. But I need to go for a second pass, but right now (almost
3 am) I really need to get some sleep first :-).

Received on 2011-05-31 02:54:38 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.