[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff wish

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 15 Jun 2011 01:08:48 +0200

On Tue, Jun 14, 2011 at 5:33 PM, Stefan Sperling <stsp_at_elego.de> wrote:
> On Tue, Jun 14, 2011 at 05:21:27PM +0200, Neels J Hofmeyr wrote:
>> Hi Johan,
>> it's been a while and I still haven't sent you my diff wish we briefly
>> touched on the Subversion hackathon.

Hi Neels, thanks for pursuing this further.

>> Here is a fabricated example of why I don't like diff to match empty lines:
>> A couple of lines get replaced by completely different ones. By matching the
>> blank line in the middle, it becomes far less readable, IMHO. In my fantasy
>> dream world, this diff would print:
>> [[[
>> Index: x
>> ===================================================================
>> --- x (revision 1)
>> +++ x (working copy)
>> @@ -4,11 +4,13 @@
>> void aaa()
>> {
>> - if (x)
>> - do(things);
>> -
>> - if (y)
>> - do(stuff);
>> + while (x || y)
>> + {
>> + check(something);
>> + notify(stuff);
>> +
>> + try(somethingelse);
>> + }
>> bb(b);
>> }
>> ]]]

Yeah, that's certainly a nicer diff for human consumption :-). But
strictly speaking it's a larger diff (more lines marked as +/-), so
that makes it less optimal for the current algorithm.

The "minimality" criterion of diff (with the LCS) makes it easy to
reason about, and makes for a nice and clear mathematical definition
of the requested diff result. But I agree that it doesn't necessarily
lead to "good-quality" diffs for human readers.

So: good-quality != minimal, but it's more of a "soft" criterion,
depends on the language, context, ... what lines are important to the
user, ...

Introducing heuristics in one form or another is probably the only way
to improve this. I'm not an expert in this area myself (I'm actually
more of a theoretical mathematician, so I'm naturally skeptical of
anything without a formal proof :-)). But I have also read some good
things about patience diff, like Stefan suggested ...

> Do you know about patience diff?
> http://bramcohen.livejournal.com/73318.html
> I think we should try teaching this algorithm to svn diff at some point.
> It's a lot more generic than just checking for empty lines and should
> yield the results you want.

Or if Morten has some great inspiration along similar lines, that
might be equally good or better...

On Tue, Jun 14, 2011 at 7:53 PM, Morten Kloster <morklo_at_gmail.com> wrote:
> Actually, I was already planning on making the LCS algorithm estimate
> how statistically significant each matching section is (just a simple
> heuristic, of course, nothing mathematically exact) - I need this for the
> proposals in my recent thread "Improvements to diff3 (merge)
> performance" - and the standard diff could then take an option -noflukes
> (for instance) which would only keep the significant matches. That
> should eliminate (or at least reduce) both problems with blank lines and
> similar issues with braces being matched in unrelated code.
> Estimating the significance should be quite quick, so no worries there.
> ---
> Morten

Intuitively, I'd say: let's look into patience diff (or a variation
thereof), because it's already being used in several (D)VCS'es, so it
has already had a lot of exposure. But that's not really a strong
argument :-). Maybe another approach is easier to implement in
libsvn_diff, and yields equally good or even better results ... I
don't know.

One thing I'm not sure about: suppose we have a really good "heuristic
diff", should we then make it the default (and make --minimal an
option), or should we make it optional (--nice)? I guess we'll see
whenever we get at the point where a heuristic diff is implemented

Note that GNU diff uses some heuristics by default (and --minimal is
an option). Maybe GNU diff would already give a "nice" result with
your example with its built-in heuristics?

Received on 2011-06-15 01:09:41 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.