[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Line diff handled badly (by old algorithm)

From: Simon Large <simon_at_skirridsystems.co.uk>
Date: 2006-06-07 22:35:17 CEST

Joseph Galbraith wrote:
> Hmm... I'm looking at the following changed lines:
> Parse_version(*pPacket);
> bGoodPacket = Parse_version(*pPacket);
> In this case, it is showing bGood added,
> the Pa matching and cket = Pa added where
> what I'd want to see is 'bGoodPacket = '
> added.
> I don't know enough to really propose a good
> solution... so the take the following with
> a grain of salt... they probably won't
> work :-)
> Maybe what we want is to find the longest identical
> substring with-in the two lines and run the diff
> alogrithm on either side of that.

You really need to do that recursively, which is where it gets tricky.
After picking out the longest matching substring, you then need to look
on either side and pick out the next longest matching substrings until
you get a tree of matching and non-matching fragments.

Also, finding the longest matching substring sounds hard to do
efficiently and needs a huge number of compares, so the algorithm would
have to duck out of the line length is too long (I remember someone on
this list having a file format with a single line 30K long).

> Also, if the
> longest matching substring is less the 25% of the
> total line length, we might consider it
> an remove / add instead of a change and abandon the
> diff algorithm.
> Another idea that comes to mind is that we might want
> to say the smallest unit that can be added or deleted
> is a word... so if we the diff algorithm identified
> something smaller than that, we'd expand it to a word.

I wondered about that too. But if you have a long identifier which you
make a small change to, it makes it harder to see what the change was. I
sometimes fail to release the shift key early enough when typing camel
case, and that can lead to hard-to-spot differences (depends on your
font) eg.

Lucida Console is a reasonable looking monospace font, but some letters
have poor case differentiation, eg. cCkKoOpPsSuUvVwWxXzZ
Hmm, that's worse than I thought. Back to Courier :-(

And of course you need to define what makes a word boundary, which is
(programming) language-dependent.


   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Wed Jun 7 22:34:44 2006

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.