# Symmetric Merge -- Algorithm

Date: Mon, 16 Apr 2012 17:25:17 +0100

I have written out how I think a large part of the symmetric merge
algorithm should go, in more detail. Review and other feedback would
be welcome.

We need to write out the algorithm in this level of detail and a
little more detail, and then write several new functions for
implementing parts of it. I had until recently been thinking that I
would be able to re-work most of the existing functions to accomodate
what we need, and I have made some improvements along the way, but
that is proving too difficult. (I started making some notes in
that I think we should be looking to change in the existing merge
code.)

At this point I haven't included processing of subtree mergeinfo in
the algorithm description, though I have that in mind too.

I hope your mail viewers preserve indentation and line breaks, as I've
used indentation (space characters only) for formatting.

Symmetric Merge Algorithm

1. Find the latest rev of A synced to B and of B synced to A.
2. Choose a base.
3. Identify cherry-picks.
4. Break into 3-way merges, skipping the cherry-picks.
5. Perform the 3-way merges and mergeinfo addition.

In more detail.

1. Find the latest revision of A synced to B and the latest revision of
B synced to A.

This means, for the A to B direction, the youngest revision 'rN'
on A at which all revisions of A, up to and including rN, are now
recorded as having been merged into B. And "now recorded" means
recorded in the version of B that is being used as the merge
target. Usually the result corresponds to the most recent merge
into B from A, but not necessarily, because later revisions from
A might previously have been cherry-picked into B.

We consider only direct A <-> B mergeinfo. (Consideration of a
merge graph extending to other branches is future work. Since we
record mergeinfo transitively, considering only direct A<->B
mergeinfo already suffices for some 3-branch scenarios too, but
not all such scenarios.)

In:
A_tip: pathrev
B_wc: wc-abspath
(Note: B is a WC "actual" tree including any local mods.)

Out:
A_base: pathrev
B_base: pathrev
(Note: B_base can't possibly be B if B has local mods.)

Implementation:
Currently:
find_base_on_source()
find_base_on_target()
TODO: fill in the details here and re-visit the implementation.

2. Choose a base.

Choose the latest revision of A synced to B or the latest revision
of B synced to A.

Each candidate base is a common ancestor of the source and target,
if ancestry means following either the direct line of descent or
the arrows that indicate complete merges.

Choose the 'more recent' one in some sense -- be it time of commit,
or number of changes since then, or some other measure.

[TODO] In what cases do the selection methods give different
results? Only after a criss-cross merge?

In:
A_base: pathrev
B_base: pathrev

Out:
base: pathrev

Implementation:
Currently, in find_symmetric_merge():
(A_base->rev > B_base->rev) ? A_base : B_base

3. Identify cherry-picks.

Find changes along the source side of the merge that are already
'in' the target.

Look for merges in both directions (from current source to current
target ("forward") and from current target to current source
("backward")). For each merge:
Break down the merge into its component logical changes.
If the merge consists entirely of logical changes that are not
in the target, or consists entirely of logical changes that
are in the target, treat it as a unit -- a cherry pick.
Otherwise, fall back to the user: report to the user which
logical changes that merge includes, and suggest that the user
run more specific merges that pull only the subset of logical
changes they want.

3a. Create an (implicit or explicit) list of indivisible changes
along each side of the merge, since the BASE. For example:

Source side Target side
B_at_10:A_at_15 B_at_10:B_at_11 (B_at_10 is BASE)
A_at_15:A_at_20 B_at_11:B_at_12
... ...
A_at_26:A_at_27 B_at_29:B_at_30
B_at_30:B_wc (B_at_30 is B_wc's origin)

In:
base
A_tip
B_wc

Out:
A_changes: a list of (URL1_at_REV1 : URL2_at_REV2) pairs?
Or, more compactly, BASE plus a list of REVs
that index into the branch history?
B_changes: the same

Note: The first change in one of the lists might be a cross-branch
change (from 'BASE' on the target branch to 'MID' on the source
branch), whereas all subsequent changes are changes along a
branch.

Note: The B_changes output needs to be able to represent B_wc as
its end point, implicitly or explicitly.

Note: These lists need to reference changes in such a way that we
can fetch and examine the changes, at least in terms of their
mergeinfo and whether they're operative.

TODO: Should the outputs identify each individual change (revision)
as operative or inoperative, or is it acceptable to output
revisions (or ranges of revs) that are not all known to be
operative?

3b. Discover the changes from A that have been merged to B since BASE.

In:
A_changes
B_changes

Out:
A_changes_to_skip: a subset of A_changes
A_changes_to_warn: a subset of A_changes

Implementation phase 1:

This detects direct cherry-picks in the same direction as the
present merge, which is all that Subversion 1.7 supports.

Fetch mergeinfo on B_wc; note any changes from A that are recorded
as merged to B (since BASE, and directly) as "already on B".

Implementation phase 2:

This adds detection of cherry-picks in the opposite direction,
which is new functionality.

For each change in A_changes:
Fetch the mergeinfo change.
If the mergeinfo did change (it represents a merge into A):
If that merge was a direct merge from B, and from nowhere
else, add this A change onto A_changes_to_skip. If that is,
instead, a merge from both B and other sources, then add this
A change onto A_changes_to_warn.

Add this to the phase 1 results for A->B.

Note: If the first change is a BASE:MID cross-branch change, it
must first be turned into the equivalent series of source-side
changes, which is a non-trivial exercise.

Implementation phase 3:

This detects both direct and indirect (that is, via a third
branch) cherry-pick merges, in both directions. This is more
complex, and a simple implementation of it may run slowly in
some scenarios.

For each change in A_changes:
Fetch the mergeinfo change.
If the mergeinfo did change (it represents a merge into A),
break down that merge into its component logical changes,
else take this change on A as the logical change to test.
See if those logical changes are already on B.
If all of those logical changes are on B, add this A change
onto A_changes_to_skip; if some of them are on B, add this
A change onto A_changes_to_warn.

4. Break into 3-way merges, skipping the cherry-picks.

Implementation:

# Define a merge as (base, tip, target).

merges = [(BASE, A_tip, B_wc)]
for change in A_changes_to_skip:
m = find change.rev in merges
replace [m] with [
(m.base, change-1, m.target),
(change, m.tip, m.target) ]
# elide either or both of those if they are no-ops

5. Perform the 3-way merges and mergeinfo addition.

Implementation:

for m in merges:
three_way_merge(m.base, m.tip, m.target)
if m.base is on A: