Fixing merge - Subtree, Cyclic, and Tree Change cases

From: Andy Singleton <andy_at_assembla.com>
Date: Sun, 17 Jul 2011 18:03:42 -0400

To start the discussion, I will refer to this blog article by Mark Phippard:

http://blogs.collab.net/subversion/2008/07/subversion-merg/

I found the article to be a good overview of the issues. I think that we
need help from Mark. On the other hand, I have seen that Mark sometimes
makes discouraging comments. My work is apparently "hand wavey" and
"proprietary". I'm used to this treatment because I have 25 developers
who work for me who often think that I am full of crap. However, it
might have a discouraging effect on other contributors. For example, you
can see in this great ticket thread -
http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I
do not think it is possible in this design....I think we need to accept
the limitations of the current design and work towards doing the best we
can within that design" Apparently that was enough to kill progress. I
think we should keep a more open mind going forward.

I'm going to make some claims that some problems have "straightforward"
solutions. That doesn't mean they are simple solutions. Handling all of
the merge cases is going to be hard. However, they are straightforward
in the sense that we can discuss the strategy at the high level used in
Mark's article.

Let's consider three issues: Subtree merginfo, cyclic merge, and tree
change operations

SUBTREE MERGINFO

Mark notes that reintegrate does not work if you have subtree merginfo.
The subtrees potentially make the top-level mergeinfo inaccurate. So,
basically everyone that has looked at merge problems in the past four
years, including Mark, has tried to get rid of subtree merginfo. It's
amazing that Subversion still tries to support this feature. It can't be
supported in NewMerge.

In the following sections, we will also see that the merginfo data is
too sparse, and we need to replace it with something bigger and more
extensible.

CYCLIC MERGE

The case where we merge back and forth between a development or
deployment branch, and trunk, is the base case for merge. It should be
supported. Subversion only supports it with special instructions. This
is the "cyclic merge" problem.

It seems that we have two basic ways to do a merge. We can grab all of
the changes that we are trying to merge in one big diff between the
branch we are merging from and the branch we are merging into - the
reintegrate merge as described in Mark's article. Or, we can
sequentially apply or "replay" each of the changes that we want to merge
into our working copy - the "recursive" strategy that is the default for
git.

It seems to me that the "one big diff" and the replay strategy are
closely related. When you are replaying, you grab all of the changes in
any sequence of revisions that doesn't include a merge as one big diff.
So, the current "one big diff" strategy is a special case of the replay
strategy that applies when there are no intermediate merges from other
branches or cherrypicks.

But wait! According to this article, we can't use the replay strategy
because we are missing part of the replay. We lose information that was
used to resolve a merge when composing merge commits. If we had that
information, we could replay individual merges, and handle a higher
percentage of the cyclic merge cases.

This problem seems to have a straightforward solution. When we commit
the merge, we can stuff the changeset that represents the difference
between the merge, and the commit, into the merge_history. We just need
an extensible merge_history format to hold it.

It's totally not clear to me why you need to say "reintegrate" when you
merge to trunk, and why you need to update the branch after you do a
reintegrate merge from it. The computer should be able to remember the
history of merges and it should be obvious which things have been merged
and which revisions have been committed on both branches. The only
reason that I can think if is that that the mergeinfo is so sparse that
the computer doesn't remember enough about the merge history. Would a
bigger and more extensible data format give us a straightforward way to
solve that problem?

TREE CHANGE

We can identify tree changes by pattern matching. This is the same
tactic that git uses, without any other tree change tracking. We can
identify when this match is successful because the match is applied,
examined by the merger, and then the merge is committed. In this case we
could write the tree map into the merge_history so that we can map
changes bi-directionally during future merges without guessing again.
This is another case of saving information that we need to replay a merge.

I think we could get a similar effect by generating a move operation
(normal copy & delete form) as part of the merge. I think that this
mapping would need to be done by updates as well as by explicit merges.

EXPERTISE
Who on this list knows enough about the core algorithm used in merge to
critique these suggestions and point to places in the code or documentation?
Received on 2011-07-18 00:04:28 CEST

This message: [ Message body ]
Next message: neels_at_apache.org: "[svnbench] Failed to build Revision: 1147719."
Previous message: Daniel Shahaf: "Re: python bindings leak memory (Re: 1.7.0-beta1 up for testing)"
Next in thread: Paul Burba: "Re: Fixing merge - Subtree, Cyclic, and Tree Change cases"
Reply: Paul Burba: "Re: Fixing merge - Subtree, Cyclic, and Tree Change cases"
Reply: Folker Schamel: "Re: Fixing merge - Subtree, Cyclic, and Tree Change cases"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]