Re: [PATCH] Speed-up of libsvn_diff using token counts

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 31 May 2011 12:44:17 +0200

On Tue, May 31, 2011 at 12:03 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Tue, May 31, 2011 at 02:53:47 +0200:
>> - Take a closer look at measuring the overhead of the token counting.
>> Maybe you can also provide some numbers here? I think a good test for
>> measuring this in practice is:
>> 1. take a very large file
>> 2. change a line in the beginning and at the end
>> (eliminates prefix/suffix scanning, making sure everything goes to LCS)
>
> That sounds roundabout. -DSUFFIX_LINES_TO_KEEP=0 ?

No, that only impacts suffix scanning (not prefix). Plus it actually
makes the suffix scanning even more thorough than usual :-). It makes
sure every line of suffix is eliminated, and not a single line is
"kept".

The SUFFIX_LINES_TO_KEEP=50 (the default) just helps to give the LCS
scanning some "wiggle room", so as to find (in most practical cases)
the same LCS as before. Without this setting, you can get different
LCS'es than before.

To programmatically disable the prefix/suffix scanning, I think it's
easiest to just comment out the following lines in
libsvn_diff/diff_file.c#datasources_open:

[[[
SVN_ERR(find_identical_prefix(&reached_one_eof, prefix_lines,
files, datasources_len, file_baton->pool));

  if (!reached_one_eof)
    /* No file consisted totally of identical prefix,
     * so there may be some identical suffix. */
    SVN_ERR(find_identical_suffix(suffix_lines, files, datasources_len,
                                  file_baton->pool));
]]]

That should work correctly, because the prefix_lines and suffix_lines
are already initialized to 0 in the beginning of this function (which
is the correct value if there is no prefix/suffix scanning).

(sorry, I don't have a working copy at hand here, and in the middle of
something else, so can't make a proper patch or anything).

Maybe a new 'knob' should be added for this? To make it easier to
(stress) test the LCS ... sorry don't have time to do it myself now.

-- 
Johan

Received on 2011-05-31 12:45:12 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: [PATCH] Fix for issue 3799 - exporting file should not overwrite"
Previous message: Daniel Shahaf: "Re: [PATCH] Speed-up of libsvn_diff using token counts"
In reply to: Daniel Shahaf: "Re: [PATCH] Speed-up of libsvn_diff using token counts"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]