[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Subtree mergeinfo -- what I learnt at the Hackathon

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 20 Jun 2012 15:59:41 +0100 (BST)

Warning: long email.  Merge gurus and enthusiasts please comment!

At the Hackathon last week, the biggest single topic I discussed with others was the handling of subtree merges: both the current handling (which Paul told me a lot about), and how we could model them in a more pure/general/consistent way in future (which Brane helped me think about).

This is my understanding.

SUMMARY

The current handling is that a normal 'sync' merge will merge any eligible revisions from each subtree.  It uses relative-path equivalence as the definition of the 'corresponding' subtree.

For the future, it may be better to match subtrees in one branch of the merge to subtrees in the other branch, with the correspondence determined by following their own lines of history, including renames.  The merging of a subtree would then be consistent with what would currently happen if you choose to merge that subtree by itself, as the root of a merge -- that is, its location history *would* be followed.

We need to decide about how the 'Symmetric merge' should handle subtrees, initially.  This should be made in the light of both the need for backward compatibility and thought about where the merge capability is headed.

CURRENT HANDLING OF SUBTREE MERGEINFO

An ordinary 'sync' merge that considers mergeinfo will consider mergeinfo on subtrees as well as mergeinfo on the merge target root node.  For each target subtree, it will merge any 'eligible' revisions from the corresponding subtree in the source.

  Example:
    with a merge-root node (named A in the source, B in the target,
    and O at the YCA), and a subtree (named 'foo' at all times):

  A         [x] --- [x] --- [x] -------------------
  A/foo     [x]     [x]     [x]                 \
           /                   \ r7: some        \ r8:
  O     [ ] r4:     r5:     r6: \ set of          \sync
  O/foo [ ] cp+mod  mod     mod  \ subtree         \merge
           \                      \ merges          \
  B         [x] ------------------ [x] ------------ [x]
  B/foo     [x]                    [x]              [x]
  mergeinfo ()                     (/A:5            (/A:4-6)
                                    /A/foo:6)       # foo inherits

The definition of a 'corresponding subtree' is the set of node-revisions that have the same, constant, relative path below the merge source root and the merge target root.

While the two lines of history linking the root node of the merge source to that of the merge target (via their YCA) will follow any copies (and thus any renames) in their histories, the subtree is considered to be whatever node exists at the given relative path under that root.  If there are revisions at which such a subtree node does not exist at the given relative path, then those revisions are not considered eligible for merging.

  Example:
    now with an initial subtree named 'foo', deleted in r5 in
    branch A, and replaced in r6 by a new 'foo' copied from 'bar'
    (where 'bar' is assumed to exist somewhere but not shown here):

            r4      r5      r6     r7               r8

  A         [x] --- [x] --- [x] -- [x] ------------
  A/foo    /[x] --- [D]         __ [A] ------------
          / /                  /                 \\ r8:
  O     [ ]/            .../bar                   \\ sync
  O/foo [ ]\                                       \\ merge
          \ \                                       \\
  B        \[x] ----------------------------------- [x]
  B/foo     [x] ----------------------------------- [x]

In the r8 merge here, the following changes are merged to the target's subtree 'foo':

  r4 A/foo  -- mod
  r5 A/foo  -- delete
  r6        -- no, not eligible for merge
  r7 A/foo  -- add

I'm not yet clear on the details -- such as whether this would be merged as a single diff between A/foo_at_3 and A/foo_at_7 (ignoring ancestry) or would be broken into multiple revision ranges (because of the gap at r6, and/or because of A/foo_at_3 being unrelated to A/foo_at_7).

INCONSISTENCY

If we look at the diagram above and, instead of the merge shown for r8, imagine we run

  svn merge ^/A/foo wc-B/foo

then it should be clear that the merge will (attempt to) find all changes along the copy-history of A/foo_at_7, back to its youngest common ancestor with B/foo_at_7, and consider those as the revisions/changes eligible for merging.

But if we run the merge shown in the diagram,

  svn merge ^/A wc-B

then the merge will not consider changes along the history of A/foo before r7, where that "same node" was named A/bar.

This difference seems undesirable.  It's not that the current behaviour is necessarily wrong, it is just what it is, but let's see if we can improve it.

PAIRING SUBTREES BY THEIR OWN ANCESTRY

For the future, it may be better to trace each subtree in one branch of the merge to a corresponding subtree in the other branch, with the correspondence determined by following renames.

This would enable the following kinds of history to be followed:

* Subtree 'foo' has been renamed in one or both branches:

  A       [A] ------ [ ] ---
  A/foo   [A]        [x]    \
          /                  \ merge:
  O     [ ]                   \ matches
  O/foo [ ]                    \ A/foo to B/bar
          \      mv foo bar     \
  B       [A] --- [ ]--------- [ ]
  B/foo   [A]     [D]
  B/bar           [A] --------- [x]

* Subtree 'foo' was created after branching, so its youngest common ancestor is later than the merge-root's YCA:

  A       [A] --- [-] --------- [x]
  A/foo           [A] --------- [x]
          /          \             \ merge:
  O     [ ]           \ merge:      \ matches
                       \ adds foo    \ A/foo to B/foo
          \             \             \
  B       [A] --- [D] --- [-] ------ [x]
  B/foo                   [A]        [x]

* Subtree 'foo' did not exist at the YCA but was created initially in some other branch and then copied into both A and B.  The YCA of the subtrees A/foo and B/foo is not itself a subtree of O, nor of A, nor of B, but simply some other location in the repository.

Notice that all of the interesting cases (where behaviour would differ from the current system) have to do with renames.

The location history of the merge *root* node can be uniquely determined by tracing only backwards, because the source root node and target root node are specified in advance.  In contrast, subtrees of the source and target root nodes cannot be uniquely paired using copy history alone because (in general) more than one copy may exist of any node.  Complete pairing requires us to have a way to follow *renames* (as distinct from merely copies), whether automatically or with manual input.

Therefore it seems we need to design this model of subtree handling on top of an assumption that information about renames will be available.

HOW SHOULD SYMMETRIC MERGE BEHAVE?

What does this mean for how the initial implementation of symmetric merge should behave?

My intention so far has been to make the 'sync-like' direction of symmetric merge (that is, same direction as previous merge) behave as a backward-compatible replacement for the current 'sync' (that is, non-reintegrate, merge-tracking, unspecified minimum revision) merge.

However, I don't want to fence us in to a new behaviour with the same backward-compatibility guarantee with regard to subtrees.  So now I'm thinking maybe a plain symmetric merge should error out if there are subtrees (that have different eligible revision ranges from the root).  What would the command-line UI look like?  A plain "merge" errors if subtree merges are needed, while an option forces it to handle them (in the current fixed-relpath way)?  And then in the future we remove the restriction if we come up with a good way to handle them?

That's enough for this email.

Please let me know your thoughts.

- Julian
Received on 2012-06-20 17:00:17 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.