[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Sat, 1 Jan 2011 21:37:03 +0100

[ Taking a privately-started discussion with danielsh to the list, in
case others have inspiration/insight about this. Question at hand: I'm
having trouble making diff3 work with prefix/suffix scanning of the
diff-optimizations-bytes branch. Any feedback is highly appreciated
:-). ]

On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
[...snip...]
> In diff3 with prefix enabled, I was seeing failures like this:
>
> EXPECTED:
> line1
> <<<
> line2
> ===
> line2 modified
>>>>
>
> ACTUAL:
> <<<
> line1
> line2
> ===
> line1
> line2 modified
>>>>
>
> so I assumed that when the prefix code gobbled the shared prefix.  How
> to fix it from there was purely guesswork on my part (and I haven't
> implemented it).
>
> These failures would go away if I prepended a "line0 left" and "line0
> right" to the file.

Yes, I know. That's indeed because the prefix-scanning code gobbled up
the identical prefix. That problem is also there in diff2.

In diff2, I fixed this by simply pre-pending a piece of "common" diff
to the diff linked list, in the function diff.c#svn_diff__diff.

From r1037371:
[[[
--- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
(original)
+++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
Sun Nov 21 02:36:07 2010
@@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
  svn_diff_t *diff;
  svn_diff_t **diff_ref = &diff;

+ if (want_common && (original_start > 1))
+ {
+ /* we have a prefix to skip */
+ (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
+
+ (*diff_ref)->type = svn_diff__type_common;
+ (*diff_ref)->original_start = 0;
+ (*diff_ref)->original_length = original_start - 1;
+ (*diff_ref)->modified_start = 0;
+ (*diff_ref)->modified_length = modified_start - 1;
+ (*diff_ref)->latest_start = 0;
+ (*diff_ref)->latest_length = 0;
+
+ diff_ref = &(*diff_ref)->next;
+ }
+
  while (1)
    {
      if (original_start < lcs->position[0]->offset
]]]

Naively, I wanted to do the same for diff3 (along with some other
tricks, to make the algorithm aware of the prefix_lines), but it
doesn't work. See the patch in attachment.

I mean: it works partly: diff-diff3-test.exe passes with the attached
patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
merge_tests.py (61) don't. I haven't studied the test failures in
detail (I probably should, but I currently can't wrap my head around
those tests).

I'm now pondering another approach, to pre-pend an "lcs" piece to the
lcs linked list, and let the rest of the algorithm do its work as
normal. Inspiration for this came when I was reading the code of
diff3.c#svn_diff__resolve_conflict, where a similar trick is used
around line 121:
[[[
    /* If there were matching symbols at the start of
     * both sequences, record that fact.
     */
    if (common_length > 0)
      {
        lcs = apr_palloc(subpool, sizeof(*lcs));
        lcs->next = NULL;
        lcs->position[0] = start_position[0];
        lcs->position[1] = start_position[1];
        lcs->length = common_length;

        lcs_ref = &lcs->next;
      }
]]]

Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
I'm already passing the prefix_lines as a new parameter. If that would
work out, I could probably remove the special common-diff-injecting
code in diff.c#svn_diff__diff.

Thoughts?

It will take me some time/experimentation to work out the details, but
I think that should work ...

Cheers,

-- 
Johan

Received on 2011-01-01 21:38:01 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.