Re: Symmetric Merge -- Algorithm

Date: Fri, 20 Apr 2012 13:56:09 +0100 (BST)

Cristian Amarie wrote:

>>   1. Find the latest rev of A synced to B and of B synced to A.
>>   2. Choose a base.
> For me 2 this is an important point regarding obtaining of the same
> result.
>
> Having
>   / A1 ... Ax ... An-1  -> An  (merge B into A produces An)
> O
>   \ B1 ... By ... Bm-1 -> Bm (merge A into B produces Bm)
> (Ab and Bb BASE candidates)

Hi Cristian.  First, let me see if I understand you.  You are talking about any graph consisting of a sequence of 'complete' merges[1] between the two branches A and B.  Is this a concrete example, where n=2 and m=3?

/ -- p ---- q ----- A1 -- s ----- A2
O       \      \      /      \      /
\ --- B1 --- B2 -- r ----- B3 -- t

Here, p/q/r/s/t means a change that is not a merge.  The p/q/r/s/t states are the base candidates, not Ab and Bb.

Did you mean, "(merge B into A produces Ax, where 1 <= x <= n)" and "(merge A into B produces By, where 1 <= y <= m)"?

>
> we get the possible paths of walking Ax->An and By->Bm.

By "paths of walking" you mean following the "merge arrows" (the '/' and '\' in my graph) and also following the "natural history" of a branch (e.g. B1 -> B2)?

So we get the possible paths of walking A1 -> A2, which are:

A1 -> s -> A2
A1 -> s -> B3 -> t -> A2

and the possible paths of walking (let's choose y=2) B2 -> B3, which are:

B2 -> r -> B3
B2 -> r -> A1 -> s -> B3

I'll stop here, for now.

- Julian

> Is it feasible to have the condition An == Bm as predicate and select
> the base candidates which verifies that there is a path (at least one?)
> for each of Ax>An and By>Bm and the result is the same?
>
> Now I suppose this cannot be know in advance, but perhaps we can
> backtrack if a pair does not satisfy? (Say for a node Ax we have a set
> of B revisions synced to A, By being the latest, but if Ax By cannot
> get An == Bm, maybe can be picked another pair (not both the latest
> revA synced to B and the other one). Again, I'm not seeing the big
> picture here - judging only from the "symmetric" point of view.
>
>>   3. Identify cherry-picks.
> ...
>>     [TODO] In what cases do the selection methods give different
>>     results? Only after a criss-cross merge?
> If we have candidate BASE being A1 and B1, and criss-cross A2>B3
> and B2>A3
>
>   / A1 ... A2  A3
> O
>   \ B1 ... B2  B3
>
> maybe the new BASE candidates can be A3 and B3 and start the
> algorithm over from here?
> I mean, A3 is the latest revA synced to B and B3 latest revB synced
> to A, right ?
>
> - Cristian
>
> On Mon, Apr 16, 2012 at 7:25 PM, Julian Foad <julian.foad_at_wandisco.com>
> wrote:
>> I have written out how I think a large part of the symmetric merge
>> algorithm should go, in more detail.  Review and other feedback would
>> be welcome.
>>
>> We need to write out the algorithm in this level of detail and a
>> little more detail, and then write several new functions for
>> implementing parts of it.  I had until recently been thinking that I
>> would be able to re-work most of the existing functions to accomodate
>> what we need, and I have made some improvements along the way, but
>> that is proving too difficult.  (I started making some notes in
> things
>> that I think we should be looking to change in the existing merge
>> code.)
>>
>> At this point I haven't included processing of subtree mergeinfo in
>> the algorithm description, though I have that in mind too.
>>
>> I hope your mail viewers preserve indentation and line breaks, as I've
>> used indentation (space characters only) for formatting.
>>
>>
>> Symmetric Merge Algorithm
>>
>>  1. Find the latest rev of A synced to B and of B synced to A.
>>  2. Choose a base.
>>  3. Identify cherry-picks.
>>  4. Break into 3-way merges, skipping the cherry-picks.
>>  5. Perform the 3-way merges and mergeinfo addition.
>>
>>
>> In more detail.
>>
>>  1. Find the latest revision of A synced to B and the latest revision of
>>    B synced to A.
>>
>>    This means, for the A to B direction, the youngest revision 'rN'
>>    on A at which all revisions of A, up to and including rN, are now
>>    recorded as having been merged into B. And "now recorded"
> means
>>    recorded in the version of B that is being used as the merge
>>    target. Usually the result corresponds to the most recent merge
>>    into B from A, but not necessarily, because later revisions from
>>    A might previously have been cherry-picked into B.
>>
>>    We consider only direct A <-> B mergeinfo. (Consideration of a
>>    merge graph extending to other branches is future work. Since we
>>    record mergeinfo transitively, considering only direct A<->B
>>    mergeinfo already suffices for some 3-branch scenarios too, but
>>    not all such scenarios.)
>>
>>    In:
>>        A_tip: pathrev
>>        B_wc:  wc-abspath
>>        (Note: B is a WC "actual" tree including any local mods.)
>>
>>    Out:
>>        A_base: pathrev
>>        B_base: pathrev
>>        (Note: B_base can't possibly be B if B has local mods.)
>>
>>    Implementation:
>>        Currently:
>>            find_base_on_source()
>>            find_base_on_target()
>>        TODO: fill in the details here and re-visit the implementation.
>>
>>
>>  2. Choose a base.
>>
>>    Choose the latest revision of A synced to B or the latest revision
>>    of B synced to A.
>>
>>    Each candidate base is a common ancestor of the source and target,
>>    if ancestry means following either the direct line of descent or
>>    the arrows that indicate complete merges.
>>
>>    Choose the 'more recent' one in some sense -- be it time of
> commit,
>>    or number of changes since then, or some other measure.
>>
>>    [TODO] In what cases do the selection methods give different
>>    results? Only after a criss-cross merge?
>>
>>    In:
>>        A_base: pathrev
>>        B_base: pathrev
>>
>>    Out:
>>        base: pathrev
>>
>>    Implementation:
>>        Currently, in find_symmetric_merge():
>>            (A_base->rev > B_base->rev) ? A_base : B_base
>>
>>
>>  3. Identify cherry-picks.
>>
>>    Find changes along the source side of the merge that are already
>>    'in' the target.
>>
>>    Look for merges in both directions (from current source to current
>>    target ("forward") and from current target to current source
>>    ("backward")). For each merge:
>>        Break down the merge into its component logical changes.
>>        If the merge consists entirely of logical changes that are not
>>        in the target, or consists entirely of logical changes that
>>        are in the target, treat it as a unit -- a cherry pick.
>>        Otherwise, fall back to the user: report to the user which
>>        logical changes that merge includes, and suggest that the user
>>        run more specific merges that pull only the subset of logical
>>        changes they want.
>>
>>  3a. Create an (implicit or explicit) list of indivisible changes
>>     along each side of the merge, since the BASE.  For example:
>>
>>          Source side           Target side
>>          B_at_10:A_at_15             B_at_10:B_at_11  (B_at_10 is BASE)
>>          A_at_15:A_at_20             B_at_11:B_at_12
>>          ...                   ...
>>          A_at_26:A_at_27             B_at_29:B_at_30
>>                                B_at_30:B_wc  (B_at_30 is B_wc's origin)
>>
>>    In:
>>        base
>>        A_tip
>>        B_wc
>>
>>    Out:
>>        A_changes: a list of (URL1_at_REV1 : URL2_at_REV2) pairs?
>>                Or, more compactly, BASE plus a list of REVs
>>                that index into the branch history?
>>        B_changes: the same
>>
>>    Note: The first change in one of the lists might be a cross-branch
>>        change (from 'BASE' on the target branch to 'MID' on
> the source
>>        branch), whereas all subsequent changes are changes along a
>>        branch.
>>
>>    Note: The B_changes output needs to be able to represent B_wc as
>>        its end point, implicitly or explicitly.
>>
>>    Note: These lists need to reference changes in such a way that we
>>        can fetch and examine the changes, at least in terms of their
>>        mergeinfo and whether they're operative.
>>
>>    TODO: Should the outputs identify each individual change (revision)
>>        as operative or inoperative, or is it acceptable to output
>>        revisions (or ranges of revs) that are not all known to be
>>        operative?
>>
>>  3b. Discover the changes from A that have been merged to B since BASE.
>>
>>    In:
>>        A_changes
>>        B_changes
>>
>>    Out:
>>        A_changes_to_skip: a subset of A_changes
>>        A_changes_to_warn: a subset of A_changes
>>
>>    Implementation phase 1:
>>
>>    This detects direct cherry-picks in the same direction as the
>>    present merge, which is all that Subversion 1.7 supports.
>>
>>    Fetch mergeinfo on B_wc; note any changes from A that are recorded
>>    as merged to B (since BASE, and directly) as "already on B".
>>
>>    Implementation phase 2:
>>
>>    This adds detection of cherry-picks in the opposite direction,
>>    which is new functionality.
>>
>>    For each change in A_changes:
>>        Fetch the mergeinfo change.
>>        If the mergeinfo did change (it represents a merge into A):
>>        If that merge was a direct merge from B, and from nowhere
>>        else, add this A change onto A_changes_to_skip.  If that is,
>>        instead, a merge from both B and other sources, then add this
>>        A change onto A_changes_to_warn.
>>
>>    Add this to the phase 1 results for A->B.
>>
>>    Note: If the first change is a BASE:MID cross-branch change, it
>>    must first be turned into the equivalent series of source-side
>>    changes, which is a non-trivial exercise.
>>
>>    Implementation phase 3:
>>
>>    This detects both direct and indirect (that is, via a third
>>    branch) cherry-pick merges, in both directions.  This is more
>>    complex, and a simple implementation of it may run slowly in
>>    some scenarios.
>>
>>    For each change in A_changes:
>>        Fetch the mergeinfo change.
>>        If the mergeinfo did change (it represents a merge into A),
>>        break down that merge into its component logical changes,
>>        else take this change on A as the logical change to test.
>>        See if those logical changes are already on B.
>>        If all of those logical changes are on B, add this A change
>>        onto A_changes_to_skip; if some of them are on B, add this
>>        A change onto A_changes_to_warn.
>>
>>
>>  4. Break into 3-way merges, skipping the cherry-picks.
>>
>>    Implementation:
>>
>>        # Define a merge as (base, tip, target).
>>
>>        merges = [(BASE, A_tip, B_wc)]
>>        for change in A_changes_to_skip:
>>            m = find change.rev in merges
>>            replace [m] with [
>>                (m.base, change-1, m.target),
>>                (change, m.tip, m.target) ]
>>            # elide either or both of those if they are no-ops
>>
>>
>>  5. Perform the 3-way merges and mergeinfo addition.
>>
>>    Implementation:
>>
>>        for m in merges:
>>            three_way_merge(m.base, m.tip, m.target)
>>            if m.base is on A:
>>                do a normal mergeinfo addition
>>            else:
>>                mergeinfo += (all source revs from YCA(?) up to m.tip)
>>
>> - Julian
>