[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 05 Jan 2011 11:11:01 +0000

On Sat, 2011-01-01, Johan Corveleyn wrote:
> [ Taking a privately-started discussion with danielsh to the list, in
> case others have inspiration/insight about this. Question at hand: I'm
> having trouble making diff3 work with prefix/suffix scanning of the
> diff-optimizations-bytes branch. Any feedback is highly appreciated
> :-). ]
>
> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> [...snip...]
> > In diff3 with prefix enabled, I was seeing failures like this:
> >
> > EXPECTED:
> > line1
> > <<<
> > line2
> > ===
> > line2 modified
> >>>>
> >
> > ACTUAL:
> > <<<
> > line1
> > line2
> > ===
> > line1
> > line2 modified
> >>>>
> >
> > so I assumed that when the prefix code gobbled the shared prefix. How
> > to fix it from there was purely guesswork on my part (and I haven't
> > implemented it).
> >
> > These failures would go away if I prepended a "line0 left" and "line0
> > right" to the file.
>
> Yes, I know. That's indeed because the prefix-scanning code gobbled up
> the identical prefix. That problem is also there in diff2.
>
> In diff2, I fixed this by simply pre-pending a piece of "common" diff
> to the diff linked list, in the function diff.c#svn_diff__diff.
>
> >From r1037371:
> [[[
> --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> (original)
> +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
> Sun Nov 21 02:36:07 2010
> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
> svn_diff_t *diff;
> svn_diff_t **diff_ref = &diff;
>
> + if (want_common && (original_start > 1))
> + {
> + /* we have a prefix to skip */
> + (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
> +
> + (*diff_ref)->type = svn_diff__type_common;
> + (*diff_ref)->original_start = 0;
> + (*diff_ref)->original_length = original_start - 1;
> + (*diff_ref)->modified_start = 0;
> + (*diff_ref)->modified_length = modified_start - 1;
> + (*diff_ref)->latest_start = 0;
> + (*diff_ref)->latest_length = 0;
> +
> + diff_ref = &(*diff_ref)->next;
> + }

Hi Johan.

I know this is work in progress but that whole "if" block of code is a
good example of where a comment above the block would be *really* useful
to tell the reader what it's for. :-)

(Also, all that dereferencing is unnecessary in that particular block,
but I looked at the rest of the function and saw that that style is used
in two other blocks where it is not unnecessary. Personally I would
find the following style more readable throughout the function:

       {
         svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff));
         new_diff->type = ...
         ...

         *diff_ref = new_diff;
         diff_ref = &new_diff->next;
       }
)

> while (1)
> {
> if (original_start < lcs->position[0]->offset
> ]]]
>
> Naively, I wanted to do the same for diff3 (along with some other
> tricks, to make the algorithm aware of the prefix_lines), but it
> doesn't work. See the patch in attachment.
>
> I mean: it works partly: diff-diff3-test.exe passes with the attached
> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
> merge_tests.py (61) don't. I haven't studied the test failures in
> detail (I probably should, but I currently can't wrap my head around
> those tests).
>
> I'm now pondering another approach, to pre-pend an "lcs" piece to the
> lcs linked list, and let the rest of the algorithm do its work as
> normal. Inspiration for this came when I was reading the code of
> diff3.c#svn_diff__resolve_conflict, where a similar trick is used
> around line 121:
> [[[
> /* If there were matching symbols at the start of
> * both sequences, record that fact.
> */
> if (common_length > 0)
> {
> lcs = apr_palloc(subpool, sizeof(*lcs));
> lcs->next = NULL;
> lcs->position[0] = start_position[0];
> lcs->position[1] = start_position[1];
> lcs->length = common_length;
>
> lcs_ref = &lcs->next;
> }
> ]]]
>
> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
> I'm already passing the prefix_lines as a new parameter. If that would
> work out, I could probably remove the special common-diff-injecting
> code in diff.c#svn_diff__diff.
>
> Thoughts?

Just from reading what you say here, that last solution sounds much
better.

- Julian

> It will take me some time/experimentation to work out the details, but
> I think that should work ...
>
> Cheers,
Received on 2011-01-05 12:11:41 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.