Re: Line diff handled badly (by old algorithm)

From: Joseph Galbraith <galb_at_vandyke.com>
Date: 2006-06-07 01:10:30 CEST

Simon Large wrote:
> Stefan Küng wrote:
>> On 6/6/06, Simon Large <simon@skirridsystems.co.uk> wrote:
>>
>>> I like it :-) It has certainly fixed the spurious matching in the old
>>> algorithm.
>>>
>>> I'm not sure the line diff algorithm is correct yet though, or at least
>>> the colour coding. Try these 2 lines:
>>>
>>> void EventWrite(U8 EventType, U8 Device)
>>> void EventWrite(U8 EventType, U8 Ident)
>>>
>>> I get some very odd colour coding rather than a straight delete of
>>> 'Device' and an add of 'Ident'.
>>
>> Well, it is 'correct'.
>
> I disagree; look at the screenshot again - it seems to be showing the
> wrong information in the colouring. 'Device' is changed to 'Ident', and
> in my Tmerge it shows Device with 'D' deleted, 'evi' unchanged and 'ce'
> added. Ident is shown with 'I' unchanged 'd' added, 'e' unchanged and
> 'nt' deleted. That makes no sense at all to me. The bits marked as
> unchanged should be the same on both sides, and in the same order.
>
> The algorithm seems to work when there is a plain addition or deletion
> of text, but gets confused by changed text.
>
>> The problem is that in such cases, it's hard
>> for the algorithm to know if it would be better to just show a
>> complete replace or still try to show inline (in this case: inword)
>> diffs.
>
> I know that, and I can see why it is difficult. Where there are several
> fragments close together you could perhaps merge them into one if the
> matching parts in the middle are shorter than say 3 characters. For
> example:
>
> int LoadInternal()
> int ReadExternal()
> ----##--##--------
>
> # shows a difference, so you might highlight the Lo/Re and the In/Ex
> separately. But because they are so close you could merge them and show
> LoadIn/ReadEx. That's just an idea, and there may be counter-examples
> where this makes the display worse, so it's not really important.

Hmm... I'm looking at the following changed lines:

Parse_version(*pPacket);
bGoodPacket = Parse_version(*pPacket);

In this case, it is showing bGood added,
the Pa matching and cket = Pa added where
what I'd want to see is 'bGoodPacket = '
added.

I don't know enough to really propose a good
solution... so the take the following with
a grain of salt... they probably won't
work :-)

Maybe what we want is to find the longest identical
substring with-in the two lines and run the diff
alogrithm on either side of that. Also, if the
longest matching substring is less the 25% of the
total line length, we might consider it
an remove / add instead of a change and abandon the
diff algorithm.

Another idea that comes to mind is that we might want
to say the smallest unit that can be added or deleted
is a word... so if we the diff algorithm identified
something smaller than that, we'd expand it to a word.

As Simon says, there are probably counter examples ... so
neither of these may be useful ideas.

Thanks,

Joseph

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Wed Jun 7 01:02:58 2006

This message: [ Message body ]
Next message: Suraj Barkale: "Re: Excel diff script"
Previous message: Joseph Galbraith: "New line diff algorithm..."
In reply to: Simon Large: "Re: Line diff handled badly (by old algorithm)"
Next in thread: Simon Large: "Re: Line diff handled badly (by old algorithm)"
Reply: Simon Large: "Re: Line diff handled badly (by old algorithm)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]