[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Symmetric Merge -- Algorithm

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Mon, 16 Apr 2012 17:25:17 +0100

I have written out how I think a large part of the symmetric merge
algorithm should go, in more detail. Review and other feedback would
be welcome.

We need to write out the algorithm in this level of detail and a
little more detail, and then write several new functions for
implementing parts of it. I had until recently been thinking that I
would be able to re-work most of the existing functions to accomodate
what we need, and I have made some improvements along the way, but
that is proving too difficult. (I started making some notes in
<http://wiki.apache.org/subversion/MergeCodeImprovements> about things
that I think we should be looking to change in the existing merge
code.)

At this point I haven't included processing of subtree mergeinfo in
the algorithm description, though I have that in mind too.

I hope your mail viewers preserve indentation and line breaks, as I've
used indentation (space characters only) for formatting.

Symmetric Merge Algorithm

 1. Find the latest rev of A synced to B and of B synced to A.
 2. Choose a base.
 3. Identify cherry-picks.
 4. Break into 3-way merges, skipping the cherry-picks.
 5. Perform the 3-way merges and mergeinfo addition.

In more detail.

 1. Find the latest revision of A synced to B and the latest revision of
    B synced to A.

    This means, for the A to B direction, the youngest revision 'rN'
    on A at which all revisions of A, up to and including rN, are now
    recorded as having been merged into B. And "now recorded" means
    recorded in the version of B that is being used as the merge
    target. Usually the result corresponds to the most recent merge
    into B from A, but not necessarily, because later revisions from
    A might previously have been cherry-picked into B.

    We consider only direct A <-> B mergeinfo. (Consideration of a
    merge graph extending to other branches is future work. Since we
    record mergeinfo transitively, considering only direct A<->B
    mergeinfo already suffices for some 3-branch scenarios too, but
    not all such scenarios.)

    In:
        A_tip: pathrev
        B_wc: wc-abspath
        (Note: B is a WC "actual" tree including any local mods.)

    Out:
        A_base: pathrev
        B_base: pathrev
        (Note: B_base can't possibly be B if B has local mods.)

    Implementation:
        Currently:
            find_base_on_source()
            find_base_on_target()
        TODO: fill in the details here and re-visit the implementation.

 2. Choose a base.

    Choose the latest revision of A synced to B or the latest revision
    of B synced to A.

    Each candidate base is a common ancestor of the source and target,
    if ancestry means following either the direct line of descent or
    the arrows that indicate complete merges.

    Choose the 'more recent' one in some sense -- be it time of commit,
    or number of changes since then, or some other measure.

    [TODO] In what cases do the selection methods give different
    results? Only after a criss-cross merge?

    In:
        A_base: pathrev
        B_base: pathrev

    Out:
        base: pathrev

    Implementation:
        Currently, in find_symmetric_merge():
            (A_base->rev > B_base->rev) ? A_base : B_base

 3. Identify cherry-picks.

    Find changes along the source side of the merge that are already
    'in' the target.

    Look for merges in both directions (from current source to current
    target ("forward") and from current target to current source
    ("backward")). For each merge:
        Break down the merge into its component logical changes.
        If the merge consists entirely of logical changes that are not
        in the target, or consists entirely of logical changes that
        are in the target, treat it as a unit -- a cherry pick.
        Otherwise, fall back to the user: report to the user which
        logical changes that merge includes, and suggest that the user
        run more specific merges that pull only the subset of logical
        changes they want.

 3a. Create an (implicit or explicit) list of indivisible changes
     along each side of the merge, since the BASE. For example:

          Source side Target side
          B_at_10:A_at_15 B_at_10:B_at_11 (B_at_10 is BASE)
          A_at_15:A_at_20 B_at_11:B_at_12
          ... ...
          A_at_26:A_at_27 B_at_29:B_at_30
                                B_at_30:B_wc (B_at_30 is B_wc's origin)

    In:
        base
        A_tip
        B_wc

    Out:
        A_changes: a list of (URL1_at_REV1 : URL2_at_REV2) pairs?
                Or, more compactly, BASE plus a list of REVs
                that index into the branch history?
        B_changes: the same

    Note: The first change in one of the lists might be a cross-branch
        change (from 'BASE' on the target branch to 'MID' on the source
        branch), whereas all subsequent changes are changes along a
        branch.

    Note: The B_changes output needs to be able to represent B_wc as
        its end point, implicitly or explicitly.

    Note: These lists need to reference changes in such a way that we
        can fetch and examine the changes, at least in terms of their
        mergeinfo and whether they're operative.

    TODO: Should the outputs identify each individual change (revision)
        as operative or inoperative, or is it acceptable to output
        revisions (or ranges of revs) that are not all known to be
        operative?

 3b. Discover the changes from A that have been merged to B since BASE.

    In:
        A_changes
        B_changes

    Out:
        A_changes_to_skip: a subset of A_changes
        A_changes_to_warn: a subset of A_changes

    Implementation phase 1:

    This detects direct cherry-picks in the same direction as the
    present merge, which is all that Subversion 1.7 supports.

    Fetch mergeinfo on B_wc; note any changes from A that are recorded
    as merged to B (since BASE, and directly) as "already on B".

    Implementation phase 2:

    This adds detection of cherry-picks in the opposite direction,
    which is new functionality.

    For each change in A_changes:
        Fetch the mergeinfo change.
        If the mergeinfo did change (it represents a merge into A):
        If that merge was a direct merge from B, and from nowhere
        else, add this A change onto A_changes_to_skip. If that is,
        instead, a merge from both B and other sources, then add this
        A change onto A_changes_to_warn.

    Add this to the phase 1 results for A->B.

    Note: If the first change is a BASE:MID cross-branch change, it
    must first be turned into the equivalent series of source-side
    changes, which is a non-trivial exercise.

    Implementation phase 3:

    This detects both direct and indirect (that is, via a third
    branch) cherry-pick merges, in both directions. This is more
    complex, and a simple implementation of it may run slowly in
    some scenarios.

    For each change in A_changes:
        Fetch the mergeinfo change.
        If the mergeinfo did change (it represents a merge into A),
        break down that merge into its component logical changes,
        else take this change on A as the logical change to test.
        See if those logical changes are already on B.
        If all of those logical changes are on B, add this A change
        onto A_changes_to_skip; if some of them are on B, add this
        A change onto A_changes_to_warn.

 4. Break into 3-way merges, skipping the cherry-picks.

    Implementation:

        # Define a merge as (base, tip, target).

        merges = [(BASE, A_tip, B_wc)]
        for change in A_changes_to_skip:
            m = find change.rev in merges
            replace [m] with [
                (m.base, change-1, m.target),
                (change, m.tip, m.target) ]
            # elide either or both of those if they are no-ops

 5. Perform the 3-way merges and mergeinfo addition.

    Implementation:

        for m in merges:
            three_way_merge(m.base, m.tip, m.target)
            if m.base is on A:
                do a normal mergeinfo addition
            else:
                mergeinfo += (all source revs from YCA(?) up to m.tip)

- Julian
Received on 2012-04-16 18:26:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.