Re: diff4-optimization-bytes

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 9 Feb 2011 13:31:22 +0100

On Wed, Feb 9, 2011 at 10:20 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> In other news, I looked into the cause --- tried to make
> datasource_get_next_token() do one more loop in the place where
> currently it does 'return if at_start_of_suffix()' --- but that didn't
> fix the truncation...

Indeed, that won't fix it.

I think the best (cleanest, most robust, least special-cased) approach
is to leave the suffix stripping as it is (and let
datasource_get_next_token() stop when it encounters the suffix), but
to count the number of suffix lines while scanning them (like we do
for prefix). Then that suffix_lines result can be passed around, like
the prefix_lines, so it ends up in the call to lcs.c#svn_diff__lcs,
where a piece of "suffix_lcs" can be added to the lcs chain (just like
the prefix_lcs is prepended to it). If we get there, the rest will
work automagically :-).

The only drawback of this is a minor loss of performance of the
optimization, since we now have to count newlines while scanning the
suffix (i.e. need to compare every byte scanned). But I can live with
that :-). I'll see how much impact this has once I get it working.

I've done part of this locally (adapting the function parameters and
passing the value around), but still have to do the hard parts:
- actually counting the newlines in find_identical_suffix
- create and append the suffix_lcs in lcs.c#svn_diff__lcs.

It might take me a couple more days to finish that.

> In the meantime, I tweaked a test to make it XFail (r1068798). From
> a quick glance it seemed to be relevant, but in second thought I'll
> admit I didn't study the test thoroughly before making the patch.

Great, thanks. The fact that it's random is a bit annoying, but at
least it's something. I was thinking of writing a specific merge test
(or another one in diff-diff3-test.c) with say 100 lines of identical
suffix. But for now I don't have the time to do that. Maybe later ...

-- 
Johan

Received on 2011-02-09 13:32:21 CET

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: [PATCH] Fix for Issue #3781 (Case Sensitive Authz)"
Previous message: Arwin Arni: "[PATCH] Fix for Issue #3781 (Case Sensitive Authz)"
In reply to: Daniel Shahaf: "Re: diff4-optimization-bytes"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]