[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

From: Folker Schamel <schamel23_at_spinor.com>
Date: Tue, 19 Jul 2011 21:47:29 +0200

> On 7/18/2011 4:37 PM, Folker Schamel wrote:
>> Hi Andy,
>>
>> two thoughts about cyclic merges:
>>
>> 1. Merging should not skip cyclic merges (like this old
>> svnmerge tool), but must subtract (reverse-merge) the original
>> change first, and then add (merge) the cyclic merge, in order
>> to not loose adaptions of changes.
> I made a different proposal to solve the same problem. Following your
> example, let's say we are merging
> Trunk = (R+S+RSFix) + (T + RSTfix)
> --- where RSTFix is changes to resolve a merge conflict
> into Feature = (R+S+RSFix) + F
>
> In your proposal, you "unmerge" (R+S+RSFix) to get F. Then, having
> separated the stuff that is duplicate from the stuff that is new, you
> can do the "one big diff" style merge from Trunk.
>
> In my proposal, we save RSTfix in our expandable merge_history file, and
> then we can in many cases apply T and RSTFix separately, without any
> duplicates.
>
> Do you think that might be easier?

At the end also your proposal requires a reverse-merge to calculate
RSTfix. So the difference is basically whether to calculate RSTfix
on the fly implicitly when needed, or in advance and store it.
Which one is easier and/or faster - good question.

The idea behind the on-the-fly reverse-merge approach is
a) to operate purely on existing revisions (no need to store changes
like RSFix separately), and
b) (at least in theory) a simple merge algorithm, which basically
just says: "Merge everything over, but reverse-merge existing old
changesets before", solving this RSTfix adaption issue on the fly
automatically implicitly in a robust way, without having to deal
with adaptions like RSTfix explicitly (at least in theory).
See http://svn.haxx.se/dev/archive-2007-12/0137.shtml
(Note that this algorithm assumes "correct" merge info,
not the current subversion merge info.)

Cheers,
Folker

>> For example, suppose you have two branches A and B.
>> c100 is a change in A.
>> c101 is a change in B.
>> B merges c1 into B (maybe with or without conflict),
>> but has to adapt this change to get it compatible with c101,
>> resulting into c102.
>> Now, A merges all changes from B to A.
>> Then just merging c101 would loose the adaptions made in c102.
>> So the correct behavior is to subtract c100 and then add c101 and c102.
>> Note that if the changesets are not overlapping, the order
>> of the reverse-merges and merges does not matter.
>> But if the changesets are overlapping, then the correct of
>> reverse-merges and merges can matter.
>>
>> 2. Supporting cyclic merges correctly requires that merge-info
>> only records the direct merge info without carrying over
>> existing merge info.
>>
>> See for example
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427
>>
> I completely agree that "svn newmerge" should be able to handle the case
> that you posted in that message.
>
> I also agree that some of the problems in this merge case come from the
> inadequate data in merginfo. As you point out, we can read mergeinfo
> carefully, and we don't even know the common ancestor of two branches
> being merged. If you know the common ancestor - which is often not that
> far back in this workflow - you can ignore everything before that point.
> Why isn't there a record dropped in with every merge that says "We
> merged X (server + branch + revision) and (all of this other merge
> history from there) at time Y"? This would get dragged into the next
> branch it gets merged with. You could read back through the merge
> tree/graph to find common ancestors. We could be saving those records in
> a new and expanded merge_history.
>
>>
>> Cheers,
>> Folker
>>
>>> To start the discussion, I will refer to this blog article by Mark
>>> Phippard:
>>>
>>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>>
>>> I found the article to be a good overview of the issues.I think that we
>>> need help from Mark.On the other hand, I have seen that Mark sometimes
>>> makes discouraging comments. My work is apparently “hand wavey” and
>>> “proprietary”.I’m used to this treatment because I have 25 developers
>>> who work for me who often think that I am full of crap.However, it might
>>> have a discouraging effect on other contributors.For example, you can
>>> see in this great ticket thread -
>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I
>>> do not think it is possible in this design....I think we need to accept
>>> the limitations of the current design and work towards doing the best we
>>> can within that design” Apparently that was enough to kill progress.I
>>> think we should keep a more open mind going forward.
>>>
>>> I’m going to make some claims that some problems have “straightforward”
>>> solutions.That doesn’t mean they are simple solutions.Handling all of
>>> the merge cases is going to be hard.However, they are straightforward in
>>> the sense that we can discuss the strategy at the high level used in
>>> Mark’s article.
>>>
>>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>>> change operations
>>>
>>> SUBTREE MERGINFO
>>>
>>> Mark notes that reintegrate does not work if you have subtree merginfo.
>>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>>> basically everyone that has looked at merge problems in the past four
>>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>>> amazing that Subversion still tries to support this feature.It can’t be
>>> supported in NewMerge.
>>>
>>> In the following sections, we will also see that the merginfo data is
>>> too sparse, and we need to replace it with something bigger and more
>>> extensible.
>>>
>>> CYCLIC MERGE
>>>
>>> The case where we merge back and forth between a development or
>>> deployment branch, and trunk, is the base case for merge.It should be
>>> supported.Subversion only supports it with special instructions.This is
>>> the “cyclic merge” problem.
>>>
>>> It seems that we have two basic ways to do a merge.We can grab all of
>>> the changes that we are trying to merge in one big diff between the
>>> branch we are merging from and the branch we are merging into - the
>>> reintegrate merge as described in Mark’s article.Or, we can sequentially
>>> apply or “replay” each of the changes that we want to merge into our
>>> working copy - the “recursive” strategy that is the default for git.
>>>
>>> It seems to me that the “one big diff” and the replay strategy are
>>> closely related.When you are replaying, you grab all of the changes in
>>> any sequence of revisions that doesn’t include a merge as one big
>>> diff.So, the current “one big diff” strategy is a special case of the
>>> replay strategy that applies when there are no intermediate merges from
>>> other branches or cherrypicks.
>>>
>>> But wait!According to this article, we can’t use the replay strategy
>>> because we are missing part of the replay.We lose information that was
>>> used to resolve a merge when composing merge commits.If we had that
>>> information, we could replay individual merges, and handle a higher
>>> percentage of the cyclic merge cases.
>>>
>>> This problem seems to have a straightforward solution.When we commit the
>>> merge, we can stuff the changeset that represents the difference between
>>> the merge, and the commit, into the merge_history.We just need an
>>> extensible merge_history format to hold it.
>>>
>>> It’s totally not clear to me why you need to say “reintegrate” when you
>>> merge to trunk, and why you need to update the branch after you do a
>>> reintegrate merge from it.The computer should be able to remember the
>>> history of merges and it should be obvious which things have been merged
>>> and which revisions have been committed on both branches.The only reason
>>> that I can think if is that that the mergeinfo is so sparse that the
>>> computer doesn’t remember enough about the merge history.Would a bigger
>>> and more extensible data format give us a straightforward way to solve
>>> that problem?
>>>
>>> TREE CHANGE
>>>
>>> We can identify tree changes by pattern matching.This is the same tactic
>>> that git uses, without any other tree change tracking.We can identify
>>> when this match is successful because the match is applied, examined by
>>> the merger, and then the merge is committed.In this case we could write
>>> thetree map into the merge_history so thatwe can map changes
>>> bi-directionally during future merges without guessing again.This is
>>> another case of saving information that we need to replay a merge.
>>>
>>> I think we could get a similar effect by generating a move operation
>>> (normal copy & delete form) as part of the merge.I think that this
>>> mapping would need to be done by updates as well as by explicit merges.
>>>
>>>
>>> EXPERTISE
>>> Who on this list knows enough about the core algorithm used in merge to
>>> critique these suggestions and point to places in the code or
>>> documentation?
>>>
>>>
>>
>
>
Received on 2011-07-19 21:48:09 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.