2011/6/15 Branko Èibej <brane_at_e-reka.si>:
> On 15.06.2011 14:11, Johan Corveleyn wrote:
>>> If you have a different definition of "mis-synchronizes", please explain.
>> No, I don't mean a broken diff. The diff should at all times be
>> *correct*. That was indeed never questioned.
>>
>> I mean something like the example Neels gave with his initial approach
>> for avoid the mis-matching empty line problem. With the naive
>> solution, he gave an example of where it's not nice:
> [...]
>
> But when would the current "minimal" diff be preferable to the nicest,
> albeit not minimal, diff we can produce? After all, the fix and/or
> patience diff result is not only nicer to look at, it also gives better
> results for blame, which is the other big diff consumer.
Please define "nicest".
Note that I gave an example where f.i. "patience diff" produces worse
results IMHO than the "minimal diff" (right below Neels' example):
[[[
file a
aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc
abc
file b
abc
aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc
Patience diff will give:
-aaaaaa
-aaaaaa
-bbbbbb
-bbbbbb
-cccccc
-cccccc
abc
+aaaaaa
+aaaaaa
+bbbbbb
+bbbbbb
+cccccc
+cccccc
Minimal diff will give:
+abc
aaaaaa
aaaaaa
bbbbbb
bbbbbb
cccccc
cccccc
-abc
]]]
Which one is the nicest?
> Likewise, it'll
> give better locality for resolving merge conflicts. That's why I don't
> understand why Subversion, specifically, would need a --minimal option.
Because it could very well be possible that --minimal will give less
merge conflicts in a lot of cases. I simply don't know. Do you know of
any research into this?
I just found an interesting (though long) mail thread in the archives
of git_at_vger.kernel.org, discussing pros and cons of patience vs.
regular diff [1]. There is some analysis in there, comparing diffs and
comparing merge conflicts. An interesting one is [2], where some
numbers are gathered on the number of merge conflicts, and how large
they are.
Some quotes:
[[[
The most interesting thing to me was: of the 4072 merges I have in my
local git.git clone, only 66 show a difference.
The next interesting thing: none -- I repeat, none! -- resulted in only
one of both methods having conflicts. In all cases, if patience merge had
conflicts, so had the classical merge, and vice versa. I would have
expected patience merge to handle some conflicts more gracefully.
...
So I restricted the analysis to the non-subtree merges, and now
non-patience merge comes out 6.97297297297297 conflict lines fewer than
patience merge, with a standard deviation of 58.941106657867 (with a total
count of 37 merges).
Note that ~7 lines difference with a standard deviation of ~59 lines is
pretty close to ~0 lines difference.
In the end, the additional expense of patience merge might just not be
worth it.
]]]
I still agree that patience diff often produces nicer diff output for
humans (especially for moves of blocks of code, and for re-indentation
and stuff like that). Because by focusing on the unique lines it has a
simple heuristic to focus primarily on those lines that are most
interesting, most significant for humans (*usually*). So I too really
like patience diff.
But I don't like the hand-waving discussion that it will always be
superior, period. That's just not true. And it would be a big mistake,
IMHO, to only support a heuristic diff.
--
Johan
[1] http://git.661346.n2.nabble.com/libxdiff-and-patience-diff-td1452272.html
[2] http://git.661346.n2.nabble.com/libxdiff-and-patience-diff-td1452272i40.html#a2124969
Received on 2011-06-15 15:40:18 CEST