[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subtree mergeinfo -- what I learnt at the Hackathon

From: Paul Burba <ptburba_at_gmail.com>
Date: Wed, 20 Jun 2012 16:06:04 -0400

On Wed, Jun 20, 2012 at 10:59 AM, Julian Foad
<julianfoad_at_btopenworld.com> wrote:
> Warning: long email.  Merge gurus and enthusiasts please comment!
>
> At the Hackathon last week, the biggest single topic I discussed with others was the handling of subtree merges: both the current handling (which Paul told me a lot about), and how we could model them in a more pure/general/consistent way in future (which Brane helped me think about).
>
> This is my understanding.
>
>
> SUMMARY
>
> The current handling is that a normal 'sync' merge will merge any eligible revisions from each subtree.  It uses relative-path equivalence as the definition of the 'corresponding' subtree.
>
> For the future, it may be better to match subtrees in one branch of the merge to subtrees in the other branch, with the correspondence determined by following their own lines of history, including renames.  The merging of a subtree would then be consistent with what would currently happen if you choose to merge that subtree by itself, as the root of a merge -- that is, its location history *would* be followed.
>
> We need to decide about how the 'Symmetric merge' should handle subtrees, initially.  This should be made in the light of both the need for backward compatibility and thought about where the merge capability is headed.
>
>
> CURRENT HANDLING OF SUBTREE MERGEINFO
>
> An ordinary 'sync' merge that considers mergeinfo will consider mergeinfo on subtrees as well as mergeinfo on the merge target root node.  For each target subtree, it will merge any 'eligible' revisions from the corresponding subtree in the source.
>
>   Example:
>     with a merge-root node (named A in the source, B in the target,
>     and O at the YCA), and a subtree (named 'foo' at all times):
>
>   A         [x] --- [x] --- [x] -------------------
>   A/foo     [x]     [x]     [x]                 \
>            /                   \ r7: some        \ r8:
>   O     [ ] r4:     r5:     r6: \ set of          \sync
>   O/foo [ ] cp+mod  mod     mod  \ subtree         \merge
>            \                      \ merges          \
>   B         [x] ------------------ [x] ------------ [x]
>   B/foo     [x]                    [x]              [x]
>   mergeinfo ()                     (/A:5            (/A:4-6)
>                                     /A/foo:6)       # foo inherits
>
> The definition of a 'corresponding subtree' is the set of node-revisions that have the same, constant, relative path below the merge source root and the merge target root.
>
> While the two lines of history linking the root node of the merge source to that of the merge target (via their YCA) will follow any copies (and thus any renames) in their histories, the subtree is considered to be whatever node exists at the given relative path under that root.  If there are revisions at which such a subtree node does not exist at the given relative path, then those revisions are not considered eligible for merging.
>
>   Example:
>     now with an initial subtree named 'foo', deleted in r5 in
>     branch A, and replaced in r6 by a new 'foo' copied from 'bar'
>     (where 'bar' is assumed to exist somewhere but not shown here):
>
>             r4      r5      r6     r7               r8
>
>   A         [x] --- [x] --- [x] -- [x] ------------
>   A/foo    /[x] --- [D]         __ [A] ------------
>           / /                  /                 \\ r8:
>   O     [ ]/            .../bar                   \\ sync
>   O/foo [ ]\                                       \\ merge
>           \ \                                       \\
>   B        \[x] ----------------------------------- [x]
>   B/foo     [x] ----------------------------------- [x]
>
> In the r8 merge here, the following changes are merged to the target's subtree 'foo':
>
>   r4 A/foo  -- mod
>   r5 A/foo  -- delete
>   r6        -- no, not eligible for merge
>   r7 A/foo  -- add
>
> I'm not yet clear on the details -- such as whether this would be merged as a single diff between A/foo_at_3 and A/foo_at_7 (ignoring ancestry) or would be broken into multiple revision ranges (because of the gap at r6, and/or because of A/foo_at_3 being unrelated to A/foo_at_7).

Currently it's the former, this is merged as a single diff (i.e. there
is only one editor drive), something like this (it's not clear if
B/foo has any explicit mergeinfo of it's own prior to this merge, I'm
assuming no):

>svn merge ^/A B
--- Merging r4 through r7 into 'B':
R B\foo
--- Recording mergeinfo for merge of r4 through r7 into 'B':
 U B

> INCONSISTENCY
>
> If we look at the diagram above and, instead of the merge shown for r8, imagine we run
>
>   svn merge ^/A/foo wc-B/foo
>
> then it should be clear that the merge will (attempt to) find all changes along the copy-history of A/foo_at_7, back to its youngest common ancestor with B/foo_at_7, and consider those as the revisions/changes eligible for merging.

Assuming they have a common ancestor, since you did not specify where
'bar' came from they might not. If they don't share ancesty then a
sync merge will fail (on trunk since your change in r1215273):

>svn merge ^/A/foo B\foo
..\..\..\subversion\svn\main.c:2741: (apr_err=205000)
svn: E205000: Try 'svn help merge' for more information
..\..\..\subversion\svn\merge-cmd.c:454: (apr_err=205000)
svn: E205000: Source and target must be different but related branches
..\..\..\subversion\svn\util.c:1458: (apr_err=205000)
svn: E205000: Source and target have no common ancestor:
'%ROOT_URL%/A/foo_at_head' and 'B\foo_at_unspecified'

But yes, if they do have a common ancestor then yes, all the changes
prior to the rename and after the rename are merged.

> But if we run the merge shown in the diagram,
>
>   svn merge ^/A wc-B
>
> then the merge will not consider changes along the history of A/foo before r7, where that "same node" was named A/bar.
>
> This difference seems undesirable.  It's not that the current behaviour is necessarily wrong, it is just what it is, but let's see if we can improve it.
>
>
> PAIRING SUBTREES BY THEIR OWN ANCESTRY
>
> For the future, it may be better to trace each subtree in one branch of the merge to a corresponding subtree in the other branch, with the correspondence determined by following renames.
>
> This would enable the following kinds of history to be followed:
>
> * Subtree 'foo' has been renamed in one or both branches:
>
>   A       [A] ------ [ ] ---
>   A/foo   [A]        [x]    \
>           /                  \ merge:
>   O     [ ]                   \ matches
>   O/foo [ ]                    \ A/foo to B/bar
>           \      mv foo bar     \
>   B       [A] --- [ ]--------- [ ]
>   B/foo   [A]     [D]
>   B/bar           [A] --------- [x]
>
>
> * Subtree 'foo' was created after branching, so its youngest common ancestor is later than the merge-root's YCA:
>
>   A       [A] --- [-] --------- [x]
>   A/foo           [A] --------- [x]
>           /          \             \ merge:
>   O     [ ]           \ merge:      \ matches
>                        \ adds foo    \ A/foo to B/foo
>           \             \             \
>   B       [A] --- [D] --- [-] ------ [x]
>   B/foo                   [A]        [x]
>
>
> * Subtree 'foo' did not exist at the YCA but was created initially in some other branch and then copied into both A and B.  The YCA of the subtrees A/foo and B/foo is not itself a subtree of O, nor of A, nor of B, but simply some other location in the repository.
>
> Notice that all of the interesting cases (where behaviour would differ from the current system) have to do with renames.
>
> The location history of the merge *root* node can be uniquely determined by tracing only backwards, because the source root node and target root node are specified in advance.  In contrast, subtrees of the source and target root nodes cannot be uniquely paired using copy history alone because (in general) more than one copy may exist of any node.  Complete pairing requires us to have a way to follow *renames* (as distinct from merely copies), whether automatically or with manual input.
>
> Therefore it seems we need to design this model of subtree handling on top of an assumption that information about renames will be available.
>
>
> HOW SHOULD SYMMETRIC MERGE BEHAVE?
>
> What does this mean for how the initial implementation of symmetric merge should behave?
>
> My intention so far has been to make the 'sync-like' direction of symmetric merge (that is, same direction as previous merge) behave as a backward-compatible replacement for the current 'sync' (that is, non-reintegrate, merge-tracking, unspecified minimum revision) merge.
>
> However, I don't want to fence us in to a new behaviour with the same backward-compatibility guarantee with regard to subtrees.  So now I'm thinking maybe a plain symmetric merge should error out if there are subtrees (that have different eligible revision ranges from the root).

Do these subtrees with "different eligible revision ranges from the
root" always have explicit mergeinfo or are you thinking of some new
method to identify such subtrees? If the former, then renamed
subtrees will be handled inconsistently depending on whether or not
they have explicit mergeinfo, e.g. if there is no explicit mergeinfo
on the subtree then it's the same old diff-and-apply at the root, if
there is subtree mergeinfo, then an error is thrown -- IMHO this is a
recipe for total confusion.

Assuming for a moment that the subtrees in questions do have explicit
mergeinfo, then do you mean to error out only if the "different
eligible revision ranges from the root" are reflective of a rename or
in *every* case where the subtree needs different ranges than the
root? The latter seems needlessly restrictive.

> What would the command-line UI look like?  A plain "merge" errors if subtree merges are needed, while an option forces it to handle them (in the current fixed-relpath way)?  And then in the future we remove the restriction if we come up with a good way to handle them?

Maybe the best route is to leave merge as is for 1.8, and let the
--symmetric option become a "real" option? Admittedly both approaches
strike me as poor; is the real problem that the symmetric merge work
is not quite complete?

-- 
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba
> That's enough for this email.
>
> Please let me know your thoughts.
>
> - Julian
Received on 2012-06-20 22:06:35 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.