On Wed, Feb 9, 2011 at 10:20 AM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> In other news, I looked into the cause --- tried to make
> datasource_get_next_token() do one more loop in the place where
> currently it does 'return if at_start_of_suffix()' --- but that didn't
> fix the truncation...
Indeed, that won't fix it.
I think the best (cleanest, most robust, least special-cased) approach
is to leave the suffix stripping as it is (and let
datasource_get_next_token() stop when it encounters the suffix), but
to count the number of suffix lines while scanning them (like we do
for prefix). Then that suffix_lines result can be passed around, like
the prefix_lines, so it ends up in the call to lcs.c#svn_diff__lcs,
where a piece of "suffix_lcs" can be added to the lcs chain (just like
the prefix_lcs is prepended to it). If we get there, the rest will
work automagically :-).
The only drawback of this is a minor loss of performance of the
optimization, since we now have to count newlines while scanning the
suffix (i.e. need to compare every byte scanned). But I can live with
that :-). I'll see how much impact this has once I get it working.
I've done part of this locally (adapting the function parameters and
passing the value around), but still have to do the hard parts:
- actually counting the newlines in find_identical_suffix
- create and append the suffix_lcs in lcs.c#svn_diff__lcs.
It might take me a couple more days to finish that.
> In the meantime, I tweaked a test to make it XFail (r1068798).  From
> a quick glance it seemed to be relevant, but in second thought I'll
> admit I didn't study the test thoroughly before making the patch.
Great, thanks. The fact that it's random is a bit annoying, but at
least it's something. I was thinking of writing a specific merge test
(or another one in diff-diff3-test.c) with say 100 lines of identical
suffix. But for now I don't have the time to do that. Maybe later ...
-- 
Johan
Received on 2011-02-09 13:32:21 CET