[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

From: Andy Singleton <andy_at_assembla.com>
Date: Wed, 20 Jul 2011 10:10:43 -0400

  I do not think that the reverse merge strategy is the same as doing
multiple forward merges, and I do not think it covers enough cases. For
example, In your original example, you proposed merging in to branch
Feature, where
Feature = F+(R+S+RSFix)
You proposed "unmerging" (or even reverting) to get F, then merging in
all of the potentially redundant changes.

Let's consider the case where we make a new change F2, so
Feature = F+(R+S+RSFix)+F2

Now, F2 is an edit on one of files or code blocks that is added or
changed in R or S. This is a high-probability case, because people are
most likely to change new code. In this case, there is no way to
reverse merge and still keep F2. However, we can forward merge T and
RSTFix, and still keep F2.

On 7/19/2011 3:47 PM, Folker Schamel wrote:
>> On 7/18/2011 4:37 PM, Folker Schamel wrote:
>>> Hi Andy,
>>>
>>> two thoughts about cyclic merges:
>>>
>>> 1. Merging should not skip cyclic merges (like this old
>>> svnmerge tool), but must subtract (reverse-merge) the original
>>> change first, and then add (merge) the cyclic merge, in order
>>> to not loose adaptions of changes.
>> I made a different proposal to solve the same problem. Following your
>> example, let's say we are merging
>> Trunk = (R+S+RSFix) + (T + RSTfix)
>> --- where RSTFix is changes to resolve a merge conflict
>> into Feature = (R+S+RSFix) + F
>>
>> In your proposal, you "unmerge" (R+S+RSFix) to get F. Then, having
>> separated the stuff that is duplicate from the stuff that is new, you
>> can do the "one big diff" style merge from Trunk.
>>
>> In my proposal, we save RSTfix in our expandable merge_history file, and
>> then we can in many cases apply T and RSTFix separately, without any
>> duplicates.
>>
>> Do you think that might be easier?
>
> At the end also your proposal requires a reverse-merge to calculate
> RSTfix. So the difference is basically whether to calculate RSTfix
> on the fly implicitly when needed, or in advance and store it.
> Which one is easier and/or faster - good question.
>
> The idea behind the on-the-fly reverse-merge approach is
> a) to operate purely on existing revisions (no need to store changes
> like RSFix separately), and
> b) (at least in theory) a simple merge algorithm, which basically
> just says: "Merge everything over, but reverse-merge existing old
> changesets before", solving this RSTfix adaption issue on the fly
> automatically implicitly in a robust way, without having to deal
> with adaptions like RSTfix explicitly (at least in theory).
> See http://svn.haxx.se/dev/archive-2007-12/0137.shtml
> (Note that this algorithm assumes "correct" merge info,
> not the current subversion merge info.)
>
> Cheers,
> Folker
>
>>> For example, suppose you have two branches A and B.
>>> c100 is a change in A.
>>> c101 is a change in B.
>>> B merges c1 into B (maybe with or without conflict),
>>> but has to adapt this change to get it compatible with c101,
>>> resulting into c102.
>>> Now, A merges all changes from B to A.
>>> Then just merging c101 would loose the adaptions made in c102.
>>> So the correct behavior is to subtract c100 and then add c101 and c102.
>>> Note that if the changesets are not overlapping, the order
>>> of the reverse-merges and merges does not matter.
>>> But if the changesets are overlapping, then the correct of
>>> reverse-merges and merges can matter.
>>>
>>> 2. Supporting cyclic merges correctly requires that merge-info
>>> only records the direct merge info without carrying over
>>> existing merge info.
>>>
>>> See for example
>>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427
>>>
>>>
>> I completely agree that "svn newmerge" should be able to handle the case
>> that you posted in that message.
>>
>> I also agree that some of the problems in this merge case come from the
>> inadequate data in merginfo. As you point out, we can read mergeinfo
>> carefully, and we don't even know the common ancestor of two branches
>> being merged. If you know the common ancestor - which is often not that
>> far back in this workflow - you can ignore everything before that point.
>> Why isn't there a record dropped in with every merge that says "We
>> merged X (server + branch + revision) and (all of this other merge
>> history from there) at time Y"? This would get dragged into the next
>> branch it gets merged with. You could read back through the merge
>> tree/graph to find common ancestors. We could be saving those records in
>> a new and expanded merge_history.
>>
>>>
>>> Cheers,
>>> Folker
>>>
>>>> To start the discussion, I will refer to this blog article by Mark
>>>> Phippard:
>>>>
>>>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>>>
>>>> I found the article to be a good overview of the issues.I think
>>>> that we
>>>> need help from Mark.On the other hand, I have seen that Mark sometimes
>>>> makes discouraging comments. My work is apparently “hand wavey” and
>>>> “proprietary”.I’m used to this treatment because I have 25 developers
>>>> who work for me who often think that I am full of crap.However, it
>>>> might
>>>> have a discouraging effect on other contributors.For example, you can
>>>> see in this great ticket thread -
>>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he
>>>> states "I
>>>> do not think it is possible in this design....I think we need to
>>>> accept
>>>> the limitations of the current design and work towards doing the
>>>> best we
>>>> can within that design” Apparently that was enough to kill progress.I
>>>> think we should keep a more open mind going forward.
>>>>
>>>> I’m going to make some claims that some problems have
>>>> “straightforward”
>>>> solutions.That doesn’t mean they are simple solutions.Handling all of
>>>> the merge cases is going to be hard.However, they are
>>>> straightforward in
>>>> the sense that we can discuss the strategy at the high level used in
>>>> Mark’s article.
>>>>
>>>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>>>> change operations
>>>>
>>>> SUBTREE MERGINFO
>>>>
>>>> Mark notes that reintegrate does not work if you have subtree
>>>> merginfo.
>>>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>>>> basically everyone that has looked at merge problems in the past four
>>>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>>>> amazing that Subversion still tries to support this feature.It
>>>> can’t be
>>>> supported in NewMerge.
>>>>
>>>> In the following sections, we will also see that the merginfo data is
>>>> too sparse, and we need to replace it with something bigger and more
>>>> extensible.
>>>>
>>>> CYCLIC MERGE
>>>>
>>>> The case where we merge back and forth between a development or
>>>> deployment branch, and trunk, is the base case for merge.It should be
>>>> supported.Subversion only supports it with special
>>>> instructions.This is
>>>> the “cyclic merge” problem.
>>>>
>>>> It seems that we have two basic ways to do a merge.We can grab all of
>>>> the changes that we are trying to merge in one big diff between the
>>>> branch we are merging from and the branch we are merging into - the
>>>> reintegrate merge as described in Mark’s article.Or, we can
>>>> sequentially
>>>> apply or “replay” each of the changes that we want to merge into our
>>>> working copy - the “recursive” strategy that is the default for git.
>>>>
>>>> It seems to me that the “one big diff” and the replay strategy are
>>>> closely related.When you are replaying, you grab all of the changes in
>>>> any sequence of revisions that doesn’t include a merge as one big
>>>> diff.So, the current “one big diff” strategy is a special case of the
>>>> replay strategy that applies when there are no intermediate merges
>>>> from
>>>> other branches or cherrypicks.
>>>>
>>>> But wait!According to this article, we can’t use the replay strategy
>>>> because we are missing part of the replay.We lose information that was
>>>> used to resolve a merge when composing merge commits.If we had that
>>>> information, we could replay individual merges, and handle a higher
>>>> percentage of the cyclic merge cases.
>>>>
>>>> This problem seems to have a straightforward solution.When we
>>>> commit the
>>>> merge, we can stuff the changeset that represents the difference
>>>> between
>>>> the merge, and the commit, into the merge_history.We just need an
>>>> extensible merge_history format to hold it.
>>>>
>>>> It’s totally not clear to me why you need to say “reintegrate” when
>>>> you
>>>> merge to trunk, and why you need to update the branch after you do a
>>>> reintegrate merge from it.The computer should be able to remember the
>>>> history of merges and it should be obvious which things have been
>>>> merged
>>>> and which revisions have been committed on both branches.The only
>>>> reason
>>>> that I can think if is that that the mergeinfo is so sparse that the
>>>> computer doesn’t remember enough about the merge history.Would a
>>>> bigger
>>>> and more extensible data format give us a straightforward way to solve
>>>> that problem?
>>>>
>>>> TREE CHANGE
>>>>
>>>> We can identify tree changes by pattern matching.This is the same
>>>> tactic
>>>> that git uses, without any other tree change tracking.We can identify
>>>> when this match is successful because the match is applied,
>>>> examined by
>>>> the merger, and then the merge is committed.In this case we could
>>>> write
>>>> thetree map into the merge_history so thatwe can map changes
>>>> bi-directionally during future merges without guessing again.This is
>>>> another case of saving information that we need to replay a merge.
>>>>
>>>> I think we could get a similar effect by generating a move operation
>>>> (normal copy & delete form) as part of the merge.I think that this
>>>> mapping would need to be done by updates as well as by explicit
>>>> merges.
>>>>
>>>>
>>>> EXPERTISE
>>>> Who on this list knows enough about the core algorithm used in
>>>> merge to
>>>> critique these suggestions and point to places in the code or
>>>> documentation?
>>>>
>>>>
>>>
>>
>>
>

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton
Received on 2011-07-20 16:11:32 CEST

This is an archived mail posted to the Subversion Dev mailing list.