Re: Symmetric Merge -- Algorithm

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 20 Apr 2012 13:56:09 +0100 (BST)

Cristian Amarie wrote:

>> 1. Find the latest rev of A synced to B and of B synced to A.
>> 2. Choose a base.
> For me 2 this is an important point regarding obtaining of the same
> result.
>
> Having
> / A1 ... Ax ... An-1 -> An (merge B into A produces An)
> O
> \ B1 ... By ... Bm-1 -> Bm (merge A into B produces Bm)
> (Ab and Bb BASE candidates)

Hi Cristian. First, let me see if I understand you. You are talking about any graph consisting of a sequence of 'complete' merges[1] between the two branches A and B. Is this a concrete example, where n=2 and m=3?

    / -- p ---- q ----- A1 -- s ----- A2
O   \      \      /      \    /
    \ --- B1 --- B2 -- r ----- B3 -- t

Here, p/q/r/s/t means a change that is not a merge. The p/q/r/s/t states are the base candidates, not Ab and Bb.

Did you mean, "(merge B into A produces Ax, where 1 <= x <= n)" and "(merge A into B produces By, where 1 <= y <= m)"?

[1] <http://wiki.apache.org/subversion/SymmetricMerge#Terminology>

>
> we get the possible paths of walking Ax->An and By->Bm.

By "paths of walking" you mean following the "merge arrows" (the '/' and '\' in my graph) and also following the "natural history" of a branch (e.g. B1 -> B2)?

So we get the possible paths of walking A1 -> A2, which are:

A1 -> s -> A2
A1 -> s -> B3 -> t -> A2

and the possible paths of walking (let's choose y=2) B2 -> B3, which are:

B2 -> r -> B3
B2 -> r -> A1 -> s -> B3

I'll stop here, for now.

- Julian

> Is it feasible to have the condition An == Bm as predicate and select
> the base candidates which verifies that there is a path (at least one?)
> for each of Ax>An and By>Bm and the result is the same?
>
> Now I suppose this cannot be know in advance, but perhaps we can
> backtrack if a pair does not satisfy? (Say for a node Ax we have a set
> of B revisions synced to A, By being the latest, but if Ax By cannot
> get An == Bm, maybe can be picked another pair (not both the latest
> revA synced to B and the other one). Again, I'm not seeing the big
> picture here - judging only from the "symmetric" point of view.
>
>> 3. Identify cherry-picks.
> ...
>> [TODO] In what cases do the selection methods give different
>> results? Only after a criss-cross merge?
> If we have candidate BASE being A1 and B1, and criss-cross A2>B3
> and B2>A3
>
> / A1 ... A2 A3
> O
> \ B1 ... B2 B3
>
> maybe the new BASE candidates can be A3 and B3 and start the
> algorithm over from here?
> I mean, A3 is the latest revA synced to B and B3 latest revB synced
> to A, right ?
>
> - Cristian
>
> On Mon, Apr 16, 2012 at 7:25 PM, Julian Foad <julian.foad_at_wandisco.com>
> wrote:
>> I have written out how I think a large part of the symmetric merge
>> algorithm should go, in more detail. Review and other feedback would
>> be welcome.
>>
>> We need to write out the algorithm in this level of detail and a
>> little more detail, and then write several new functions for
>> implementing parts of it. I had until recently been thinking that I
>> would be able to re-work most of the existing functions to accomodate
>> what we need, and I have made some improvements along the way, but
>> that is proving too difficult. (I started making some notes in
>> <http://wiki.apache.org/subversion/MergeCodeImprovements> about
> things
>> that I think we should be looking to change in the existing merge
>> code.)
>>
>> At this point I haven't included processing of subtree mergeinfo in
>> the algorithm description, though I have that in mind too.
>>
>> I hope your mail viewers preserve indentation and line breaks, as I've
>> used indentation (space characters only) for formatting.
>>
>>
>> Symmetric Merge Algorithm
>>
>> 1. Find the latest rev of A synced to B and of B synced to A.
>> 2. Choose a base.
>> 3. Identify cherry-picks.
>> 4. Break into 3-way merges, skipping the cherry-picks.
>> 5. Perform the 3-way merges and mergeinfo addition.
>>
>>
>> In more detail.
>>
>> 1. Find the latest revision of A synced to B and the latest revision of
>> B synced to A.
>>
>> This means, for the A to B direction, the youngest revision 'rN'
>> on A at which all revisions of A, up to and including rN, are now
>> recorded as having been merged into B. And "now recorded"
> means
>> recorded in the version of B that is being used as the merge
>> target. Usually the result corresponds to the most recent merge
>> into B from A, but not necessarily, because later revisions from
>> A might previously have been cherry-picked into B.
>>
>> We consider only direct A <-> B mergeinfo. (Consideration of a
>> merge graph extending to other branches is future work. Since we
>> record mergeinfo transitively, considering only direct A<->B
>> mergeinfo already suffices for some 3-branch scenarios too, but
>> not all such scenarios.)
>>
>> In:
>> A_tip: pathrev
>> B_wc: wc-abspath
>> (Note: B is a WC "actual" tree including any local mods.)
>>
>> Out:
>> A_base: pathrev
>> B_base: pathrev
>> (Note: B_base can't possibly be B if B has local mods.)
>>
>> Implementation:
>> Currently:
>> find_base_on_source()
>> find_base_on_target()
>> TODO: fill in the details here and re-visit the implementation.
>>
>>
>> 2. Choose a base.
>>
>> Choose the latest revision of A synced to B or the latest revision
>> of B synced to A.
>>
>> Each candidate base is a common ancestor of the source and target,
>> if ancestry means following either the direct line of descent or
>> the arrows that indicate complete merges.
>>
>> Choose the 'more recent' one in some sense -- be it time of
> commit,
>> or number of changes since then, or some other measure.
>>
>> [TODO] In what cases do the selection methods give different
>> results? Only after a criss-cross merge?
>>
>> In:
>> A_base: pathrev
>> B_base: pathrev
>>
>> Out:
>> base: pathrev
>>
>> Implementation:
>> Currently, in find_symmetric_merge():
>> (A_base->rev > B_base->rev) ? A_base : B_base
>>
>>
>> 3. Identify cherry-picks.
>>
>> Find changes along the source side of the merge that are already
>> 'in' the target.
>>
>> Look for merges in both directions (from current source to current
>> target ("forward") and from current target to current source
>> ("backward")). For each merge:
>> Break down the merge into its component logical changes.
>> If the merge consists entirely of logical changes that are not
>> in the target, or consists entirely of logical changes that
>> are in the target, treat it as a unit -- a cherry pick.
>> Otherwise, fall back to the user: report to the user which
>> logical changes that merge includes, and suggest that the user
>> run more specific merges that pull only the subset of logical
>> changes they want.
>>
>> 3a. Create an (implicit or explicit) list of indivisible changes
>> along each side of the merge, since the BASE. For example:
>>
>> Source side Target side
>> B_at_10:A_at_15 B_at_10:B_at_11 (B_at_10 is BASE)
>> A_at_15:A_at_20 B_at_11:B_at_12
>> ... ...
>> A_at_26:A_at_27 B_at_29:B_at_30
>> B_at_30:B_wc (B_at_30 is B_wc's origin)
>>
>> In:
>> base
>> A_tip
>> B_wc
>>
>> Out:
>> A_changes: a list of (URL1_at_REV1 : URL2_at_REV2) pairs?
>> Or, more compactly, BASE plus a list of REVs
>> that index into the branch history?
>> B_changes: the same
>>
>> Note: The first change in one of the lists might be a cross-branch
>> change (from 'BASE' on the target branch to 'MID' on
> the source
>> branch), whereas all subsequent changes are changes along a
>> branch.
>>
>> Note: The B_changes output needs to be able to represent B_wc as
>> its end point, implicitly or explicitly.
>>
>> Note: These lists need to reference changes in such a way that we
>> can fetch and examine the changes, at least in terms of their
>> mergeinfo and whether they're operative.
>>
>> TODO: Should the outputs identify each individual change (revision)
>> as operative or inoperative, or is it acceptable to output
>> revisions (or ranges of revs) that are not all known to be
>> operative?
>>
>> 3b. Discover the changes from A that have been merged to B since BASE.
>>
>> In:
>> A_changes
>> B_changes
>>
>> Out:
>> A_changes_to_skip: a subset of A_changes
>> A_changes_to_warn: a subset of A_changes
>>
>> Implementation phase 1:
>>
>> This detects direct cherry-picks in the same direction as the
>> present merge, which is all that Subversion 1.7 supports.
>>
>> Fetch mergeinfo on B_wc; note any changes from A that are recorded
>> as merged to B (since BASE, and directly) as "already on B".
>>
>> Implementation phase 2:
>>
>> This adds detection of cherry-picks in the opposite direction,
>> which is new functionality.
>>
>> For each change in A_changes:
>> Fetch the mergeinfo change.
>> If the mergeinfo did change (it represents a merge into A):
>> If that merge was a direct merge from B, and from nowhere
>> else, add this A change onto A_changes_to_skip. If that is,
>> instead, a merge from both B and other sources, then add this
>> A change onto A_changes_to_warn.
>>
>> Add this to the phase 1 results for A->B.
>>
>> Note: If the first change is a BASE:MID cross-branch change, it
>> must first be turned into the equivalent series of source-side
>> changes, which is a non-trivial exercise.
>>
>> Implementation phase 3:
>>
>> This detects both direct and indirect (that is, via a third
>> branch) cherry-pick merges, in both directions. This is more
>> complex, and a simple implementation of it may run slowly in
>> some scenarios.
>>
>> For each change in A_changes:
>> Fetch the mergeinfo change.
>> If the mergeinfo did change (it represents a merge into A),
>> break down that merge into its component logical changes,
>> else take this change on A as the logical change to test.
>> See if those logical changes are already on B.
>> If all of those logical changes are on B, add this A change
>> onto A_changes_to_skip; if some of them are on B, add this
>> A change onto A_changes_to_warn.
>>
>>
>> 4. Break into 3-way merges, skipping the cherry-picks.
>>
>> Implementation:
>>
>> # Define a merge as (base, tip, target).
>>
>> merges = [(BASE, A_tip, B_wc)]
>> for change in A_changes_to_skip:
>> m = find change.rev in merges
>> replace [m] with [
>> (m.base, change-1, m.target),
>> (change, m.tip, m.target) ]
>> # elide either or both of those if they are no-ops
>>
>>
>> 5. Perform the 3-way merges and mergeinfo addition.
>>
>> Implementation:
>>
>> for m in merges:
>> three_way_merge(m.base, m.tip, m.target)
>> if m.base is on A:
>> do a normal mergeinfo addition
>> else:
>> mergeinfo += (all source revs from YCA(?) up to m.tip)
>>
>> - Julian
>
Received on 2012-04-20 14:56:46 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: Symmetric Merge -- Algorithm"
Previous message: Julian Foad: "Re: Merge policies"
In reply to: Cristian Amarie: "Re: Symmetric Merge -- Algorithm"
Next in thread: Julian Foad: "Re: Symmetric Merge -- Algorithm"
Reply: Julian Foad: "Re: Symmetric Merge -- Algorithm"
Reply: Cristian Amarie: "Re: Symmetric Merge -- Algorithm"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]