Re: [PATCH] Speed-up of libsvn_diff using token counts

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 31 May 2011 02:53:47 +0200

On Mon, May 30, 2011 at 8:50 PM, Morten Kloster <morklo_at_gmail.com> wrote:
> Johan, any progress on reviewing the code on your part? Things are
> a bit simpler now with the idx patch in: Since that patch settled the
> file (re)order issue, this patch now produces fully identical results to
> HEAD.

Thanks for the updated patch, it applies cleanly to HEAD of trunk indeed.

I've done some tests, and they look good. The result of my 2200-rev
blame is identical before or after this patch (the idx patch caused
some minor changes, but at first sight nothing drastic -- seemingly
unimportant lines, which were switched between revs they were
attributed to).

With my usual blame testcase, I use the -x-b option (ignore changes in
amount of whitespace). This helps the blame speed tremendously in this
case, because some of the revisions in the history have been
spaces->tabs and vice-versa. With this test, I could see the slight
overhead of the token counting of your patch (2%-3%), because there
are not that much "one-sided" lines. So it's slightly slower than
before.

However, when running the 2200-rev blame without ignore-options, I can
see a *huge* improvement thanks to your patch. This test used to take
over an hour. Now it's finished in 1m30s, as fast as the test with
-x-b. For the revs that change spaces->tabs all lines are "one-sided".

For real work, I prefer to use the -x-b option (I'm not interested in
who changed whitespace), but it's interesting to see the power of this
patch.

Anyway, I'm currently running the entire test-suite on your patch. So
far no problem. But before committing it I'd still want to do two
things:

- Take a closer look at measuring the overhead of the token counting.
Maybe you can also provide some numbers here? I think a good test for
measuring this in practice is:
  1. take a very large file
  2. change a line in the beginning and at the end
     (eliminates prefix/suffix scanning, making sure everything goes to LCS)
  3. diff those two

- Take a closer look at the code. I've skimmed through it, and it
looked good. But I need to go for a second pass, but right now (almost
3 am) I really need to get some sleep first :-).

-- 
Johan

Received on 2011-05-31 02:54:38 CEST

This message: [ Message body ]
Next message: Philip Martin: "Re: possible improvement to svn log with "forward" revision range"
Previous message: Morten Kloster: "Re: [PATCH] Speed-up of libsvn_diff using token counts"
In reply to: Morten Kloster: "Re: [PATCH] Speed-up of libsvn_diff using token counts"
Next in thread: Daniel Shahaf: "Re: [PATCH] Speed-up of libsvn_diff using token counts"
Reply: Daniel Shahaf: "Re: [PATCH] Speed-up of libsvn_diff using token counts"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]