[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Simplification/speed-up of libsvn_diff by eliminating idx

From: Morten Kloster <morklo_at_gmail.com>
Date: Wed, 1 Jun 2011 17:29:56 +0200

On Wed, Jun 1, 2011 at 10:28 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Wed, Jun 1, 2011 at 10:04 AM, Philip Martin
> <philip.martin_at_wandisco.com> wrote:
>> Stefan Fuhrmann <eqfox_at_web.de> writes:
>>
>>>> Updated log message:
>>>> [[[
>>>> Simpler/faster LCS algorithm in libsvn_diff by elimination of idx.
>>>>
>>>> * subversion/libsvn_diff/lcs.c
>>>>   (svn_diff__snake): Removed idx parameter (always 0)
>>>>   (svn_diff__lcs): Removed idx variable (always 0) , let d have either
>>>>    sign, and moved the origo of fp by d. New version no longer chooses
>>>>    the shorter file as the "original", and can thus give different LCS's
>>>>    depending on the order of the input files even when the files have
>>>>    different lengths. Speed is ~15% faster for core lcs algorithm.
>>>> ]]]
>>> Committed as r1128857. I took the liberty to
>>> weaken the speed-up claim.
>>
>> This commit fixed the spurious conflict on update in issue 3449, I
>> suppose the chosen LCS is "better" in this particular case.  Are there
>> likely to be cases where the chosen LCS is "worse" so that previously
>> clean merges are now conflicts?
>>
>> See r1130036 for a 3449 regression test.
>
> I was wondering about the exact same thing yesterday :). I suspected
> that the sudden resolution to issue 3449 was related to this. The
> chosen LCS is indeed better now, because it follows the same direction
> for both comparisons involved in the update/merge (i.e. (Base, Mine)
> and (Base, Theirs), whereas previously the LCS could be reversed based
> on which side was shorter). See Morten's explanation (and my reply)
> about this a couple of mails ago in this thread.
>
> But I'm also unsure about your second question: are the cases where
> there was previously a clean merge, and now a "spurious" conflict. I
> suppose one could come up with an example, but I guess it would be
> more up to interpretation whether or not there should be (or should
> have been) a conflict. I have to think some more about this. Maybe
> Morten has an idea?
>
> Cheers,
> --
> Johan
>

For merges, the idx patch (r1128857) should be almost exclusively
beneficial. As mentioned before, it's not a question of whether the
individual LCS's are better or worse; the important thing is that the
two primary LCS's for the merge (Base vs. Theirs and Base vs. Mine)
are done in a consistent manner. If a section of a file is identical
in Mine and Theirs, and it is surrounded by correctly matched
unchanged code, then it is guaranteed to survive the merge without
conflict now.

If, however, the changes are slightly different in Mine and Theirs (but
without any obvious conflict), then it is possible that it would merge
without conflict before the patch but not after. For (I believe) most of
these cases, it is necessary that the lengths are such that the old
version switches the order for BOTH diffs, i.e. Base is longer than
both Mine and Theirs. Thus the old version will also (by "chance")
perform the diffs in a consistent manner, but (again by "chance")
a manner that happens to avoid conflict for those particular changes.

Here's a carefully crafted example where this is not required:

abcde (Base)
acd (Theirs)
acbd (Mine)

Here, there is an ambiguity of whether to match the 'b's or the 'c's
between Base and Mine, which determines whether there's a
conflict with Theirs. Note that if you remove 'e' from Base, then
the old version will also give the same conflict. As Johan said, it's
not clear whether this example should conflict or not.

For blame, there is less, if any, benefit from the patch. If a revision
fully reverts the previous revision, then the old version would tend
to give the maximal preserved lines (of course, one can also argue
whether or not this is "correct"), whereas the patched version
has a chance of matching in a different manner when going
back (since it doesn't know it's going back). Of course, the old
version only guarantees this benefit if you revert the whole file;
if you revert only a part, the lengths might be such that you don't
get the same match.

One possibility that I think would maximize
the number of preserved lines (total age of lines) would be to
alternate the order, i.e. each revision is diff'ed the same way with
both its neighbors. The difference would probably be minimal,
though, and as stated above, it's not clear whether increasing the
total age of lines would truly be better.

All around, I think the patched version is about as good as we're
likely to get on these issues without major changes. :-)

Morten
Received on 2011-06-01 17:30:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.