[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff wish

From: Branko Čibej <brane_at_e-reka.si>
Date: Wed, 15 Jun 2011 12:34:31 +0200

On 15.06.2011 01:08, Johan Corveleyn wrote:
> On Tue, Jun 14, 2011 at 5:33 PM, Stefan Sperling <stsp_at_elego.de> wrote:
>> On Tue, Jun 14, 2011 at 05:21:27PM +0200, Neels J Hofmeyr wrote:
>>> Hi Johan,
>>>
>>> it's been a while and I still haven't sent you my diff wish we briefly
>>> touched on the Subversion hackathon.
> Hi Neels, thanks for pursuing this further.
>
>>> Here is a fabricated example of why I don't like diff to match empty lines:
>>> A couple of lines get replaced by completely different ones. By matching the
>>> blank line in the middle, it becomes far less readable, IMHO. In my fantasy
>>> dream world, this diff would print:
>>>
>>> [[[
>>> Index: x
>>> ===================================================================
>>> --- x (revision 1)
>>> +++ x (working copy)
>>> @@ -4,11 +4,13 @@
>>>
>>> void aaa()
>>> {
>>> - if (x)
>>> - do(things);
>>> -
>>> - if (y)
>>> - do(stuff);
>>> + while (x || y)
>>> + {
>>> + check(something);
>>> + notify(stuff);
>>> +
>>> + try(somethingelse);
>>> + }
>>>
>>> bb(b);
>>> }
>>> ]]]
> Yeah, that's certainly a nicer diff for human consumption :-). But
> strictly speaking it's a larger diff (more lines marked as +/-), so
> that makes it less optimal for the current algorithm.
>
> The "minimality" criterion of diff (with the LCS) makes it easy to
> reason about, and makes for a nice and clear mathematical definition
> of the requested diff result. But I agree that it doesn't necessarily
> lead to "good-quality" diffs for human readers.
>
> So: good-quality != minimal, but it's more of a "soft" criterion,
> depends on the language, context, ... what lines are important to the
> user, ...
>
> Introducing heuristics in one form or another is probably the only way
> to improve this. I'm not an expert in this area myself (I'm actually
> more of a theoretical mathematician, so I'm naturally skeptical of
> anything without a formal proof :-)). But I have also read some good
> things about patience diff, like Stefan suggested ...
>
>> Do you know about patience diff?
>> http://bramcohen.livejournal.com/73318.html
>> I think we should try teaching this algorithm to svn diff at some point.
>> It's a lot more generic than just checking for empty lines and should
>> yield the results you want.
> Or if Morten has some great inspiration along similar lines, that
> might be equally good or better...
>
> On Tue, Jun 14, 2011 at 7:53 PM, Morten Kloster <morklo_at_gmail.com> wrote:
>> Actually, I was already planning on making the LCS algorithm estimate
>> how statistically significant each matching section is (just a simple
>> heuristic, of course, nothing mathematically exact) - I need this for the
>> proposals in my recent thread "Improvements to diff3 (merge)
>> performance" - and the standard diff could then take an option -noflukes
>> (for instance) which would only keep the significant matches. That
>> should eliminate (or at least reduce) both problems with blank lines and
>> similar issues with braces being matched in unrelated code.
>>
>> Estimating the significance should be quite quick, so no worries there.
>> ---
>> Morten
> Intuitively, I'd say: let's look into patience diff (or a variation
> thereof), because it's already being used in several (D)VCS'es, so it
> has already had a lot of exposure. But that's not really a strong
> argument :-). Maybe another approach is easier to implement in
> libsvn_diff, and yields equally good or even better results ... I
> don't know.
>
> One thing I'm not sure about: suppose we have a really good "heuristic
> diff", should we then make it the default (and make --minimal an
> option), or should we make it optional (--nice)? I guess we'll see
> whenever we get at the point where a heuristic diff is implemented
> :-).
>
> Note that GNU diff uses some heuristics by default (and --minimal is
> an option). Maybe GNU diff would already give a "nice" result with
> your example with its built-in heuristics?

I'd say not to worry about --minimal and --nice and whatnot. Just make
diff output the sanest, nicest diff it can find. I think it's a bad idea
to give diff user-visible options that change the output in ways that
are hard to explain (shuffling lines around, as opposed to, e.g., using
a completely different diff format).

-- Brane
Received on 2011-06-15 12:35:08 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.