[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff wish

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 15 Jun 2011 15:38:56 +0200

2011/6/15 Branko Èibej <brane_at_e-reka.si>:
> On 15.06.2011 14:11, Johan Corveleyn wrote:
>>> If you have a different definition of "mis-synchronizes", please explain.
>> No, I don't mean a broken diff. The diff should at all times be
>> *correct*. That was indeed never questioned.
>>
>> I mean something like the example Neels gave with his initial approach
>> for avoid the mis-matching empty line problem. With the naive
>> solution, he gave an example of where it's not nice:
> [...]
>
> But when would the current "minimal" diff be preferable to the nicest,
> albeit not minimal, diff we can produce? After all, the fix and/or
> patience diff result is not only nicer to look at, it also gives better
> results for blame, which is the other big diff consumer.

Please define "nicest".

Note that I gave an example where f.i. "patience diff" produces worse
results IMHO than the "minimal diff" (right below Neels' example):

[[[
file a

aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc
abc

file b

abc
aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc

Patience diff will give:

-aaaaaa
-aaaaaa
-bbbbbb
-bbbbbb
-cccccc
-cccccc
abc
+aaaaaa
+aaaaaa
+bbbbbb
+bbbbbb
+cccccc
+cccccc

Minimal diff will give:

+abc
aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc
-abc
]]]

Which one is the nicest?

> Likewise, it'll
> give better locality for resolving merge conflicts. That's why I don't
> understand why Subversion, specifically, would need a --minimal option.

Because it could very well be possible that --minimal will give less
merge conflicts in a lot of cases. I simply don't know. Do you know of
any research into this?

I just found an interesting (though long) mail thread in the archives
of git_at_vger.kernel.org, discussing pros and cons of patience vs.
regular diff [1]. There is some analysis in there, comparing diffs and
comparing merge conflicts. An interesting one is [2], where some
numbers are gathered on the number of merge conflicts, and how large
they are.

Some quotes:
[[[
The most interesting thing to me was: of the 4072 merges I have in my
local git.git clone, only 66 show a difference.

The next interesting thing: none -- I repeat, none! -- resulted in only
one of both methods having conflicts. In all cases, if patience merge had
conflicts, so had the classical merge, and vice versa. I would have
expected patience merge to handle some conflicts more gracefully.

...

So I restricted the analysis to the non-subtree merges, and now
non-patience merge comes out 6.97297297297297 conflict lines fewer than
patience merge, with a standard deviation of 58.941106657867 (with a total
count of 37 merges).

Note that ~7 lines difference with a standard deviation of ~59 lines is
pretty close to ~0 lines difference.

In the end, the additional expense of patience merge might just not be
worth it.
]]]

I still agree that patience diff often produces nicer diff output for
humans (especially for moves of blocks of code, and for re-indentation
and stuff like that). Because by focusing on the unique lines it has a
simple heuristic to focus primarily on those lines that are most
interesting, most significant for humans (*usually*). So I too really
like patience diff.

But I don't like the hand-waving discussion that it will always be
superior, period. That's just not true. And it would be a big mistake,
IMHO, to only support a heuristic diff.

-- 
Johan
[1] http://git.661346.n2.nabble.com/libxdiff-and-patience-diff-td1452272.html
[2] http://git.661346.n2.nabble.com/libxdiff-and-patience-diff-td1452272i40.html#a2124969
Received on 2011-06-15 15:40:18 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.