[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff-optimizations-bytes: how to make diff3 work with prefix/suffix scanning

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 5 Jan 2011 13:44:26 +0100

On Wed, Jan 5, 2011 at 12:11 PM, Julian Foad <julian.foad_at_wandisco.com> wrote:
> On Sat, 2011-01-01, Johan Corveleyn wrote:
>> [ Taking a privately-started discussion with danielsh to the list, in
>> case others have inspiration/insight about this. Question at hand: I'm
>> having trouble making diff3 work with prefix/suffix scanning of the
>> diff-optimizations-bytes branch. Any feedback is highly appreciated
>> :-). ]
>>
>> On Fri, Dec 31, 2010 at 2:38 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> [...snip...]
>> > In diff3 with prefix enabled, I was seeing failures like this:
>> >
>> > EXPECTED:
>> > line1
>> > <<<
>> > line2
>> > ===
>> > line2 modified
>> >>>>
>> >
>> > ACTUAL:
>> > <<<
>> > line1
>> > line2
>> > ===
>> > line1
>> > line2 modified
>> >>>>
>> >
>> > so I assumed that when the prefix code gobbled the shared prefix.  How
>> > to fix it from there was purely guesswork on my part (and I haven't
>> > implemented it).
>> >
>> > These failures would go away if I prepended a "line0 left" and "line0
>> > right" to the file.
>>
>> Yes, I know. That's indeed because the prefix-scanning code gobbled up
>> the identical prefix. That problem is also there in diff2.
>>
>> In diff2, I fixed this by simply pre-pending a piece of "common" diff
>> to the diff linked list, in the function diff.c#svn_diff__diff.
>>
>> >From r1037371:
>> [[[
>> --- subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
>> (original)
>> +++ subversion/branches/diff-optimizations-bytes/subversion/libsvn_diff/diff.c
>> Sun Nov 21 02:36:07 2010
>> @@ -43,6 +43,22 @@ svn_diff__diff(svn_diff__lcs_t *lcs,
>>   svn_diff_t *diff;
>>   svn_diff_t **diff_ref = &diff;
>>
>> +  if (want_common && (original_start > 1))
>> +    {
>> +      /* we have a prefix to skip */
>> +      (*diff_ref) = apr_palloc(pool, sizeof(**diff_ref));
>> +
>> +      (*diff_ref)->type = svn_diff__type_common;
>> +      (*diff_ref)->original_start = 0;
>> +      (*diff_ref)->original_length = original_start - 1;
>> +      (*diff_ref)->modified_start = 0;
>> +      (*diff_ref)->modified_length = modified_start - 1;
>> +      (*diff_ref)->latest_start = 0;
>> +      (*diff_ref)->latest_length = 0;
>> +
>> +      diff_ref = &(*diff_ref)->next;
>> +    }
>
> Hi Johan.
>
> I know this is work in progress but that whole "if" block of code is a
> good example of where a comment above the block would be *really* useful
> to tell the reader what it's for.  :-)

Ok, you're absolutely right. If I can't throw this block away (as
suggested below), I'll add some more comments.

> (Also, all that dereferencing is unnecessary in that particular block,
> but I looked at the rest of the function and saw that that style is used
> in two other blocks where it is not unnecessary.  Personally I would
> find the following style more readable throughout the function:
>
>       {
>         svn_diff_t *new_diff = apr_palloc(pool, sizeof(*new_diff));
>         new_diff->type = ...
>         ...
>
>         *diff_ref = new_diff;
>         diff_ref = &new_diff->next;
>       }
> )

Yep, that's definitely clearer. I just copy-pasted from the other
similar blocks in that function. But you're right.

Now, this style is also used in other similar places in diff3.c and
diff4.c. So if it's decided that it's best to change this to make it
more readable, I guess it should be changed in all those places
(preferably on trunk, it's not specific to the
diff-optimizations-bytes branch).

>>   while (1)
>>     {
>>       if (original_start < lcs->position[0]->offset
>> ]]]
>>
>> Naively, I wanted to do the same for diff3 (along with some other
>> tricks, to make the algorithm aware of the prefix_lines), but it
>> doesn't work. See the patch in attachment.
>>
>> I mean: it works partly: diff-diff3-test.exe passes with the attached
>> patch, but merge_reintegrate_tests.py (1, 2, 10, 13 and 14) and
>> merge_tests.py (61) don't. I haven't studied the test failures in
>> detail (I probably should, but I currently can't wrap my head around
>> those tests).
>>
>> I'm now pondering another approach, to pre-pend an "lcs" piece to the
>> lcs linked list, and let the rest of the algorithm do its work as
>> normal. Inspiration for this came when I was reading the code of
>> diff3.c#svn_diff__resolve_conflict, where a similar trick is used
>> around line 121:
>> [[[
>>     /* If there were matching symbols at the start of
>>      * both sequences, record that fact.
>>      */
>>     if (common_length > 0)
>>       {
>>         lcs = apr_palloc(subpool, sizeof(*lcs));
>>         lcs->next = NULL;
>>         lcs->position[0] = start_position[0];
>>         lcs->position[1] = start_position[1];
>>         lcs->length = common_length;
>>
>>         lcs_ref = &lcs->next;
>>       }
>> ]]]
>>
>> Maybe I can even integrate this logic into lcs.c#svn_diff__lcs, where
>> I'm already passing the prefix_lines as a new parameter. If that would
>> work out, I could probably remove the special common-diff-injecting
>> code in diff.c#svn_diff__diff.
>>
>> Thoughts?
>
> Just from reading what you say here, that last solution sounds much
> better.

Yep. I hope to get this working soon. I did an attempt yesterday, but
ended up with causing an infinite loop somewhere/somehow, and gave up
around 2am :-). I'll give it another try tonight or tomorrow.

Thanks.

-- 
Johan
Received on 2011-01-05 13:45:20 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.