[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Two-phase merge

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Thu, 10 May 2012 12:50:48 +0100 (BST)

On 29 April, Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> Stefan Fuhrmann wrote:
>>> [...] The proposal I make here is not
>>> an alternative to e.g. "Symmetric Merge" but rather a
>>> refined merge strategy that would also benefit the current
>>> merge.
[...]
>>> In end, we will always merge individual file or property
>>> changes but we plan the merge on tree-level. For many
>>> file nodes, this will unnecessary intermediate states
>>> where the merge got fragmented due to a change to
>>> some *other* file.
>>
>> Yes.  That is likely to be a significant cause of extra conflict in cases
>> where the user has sub-tree mergeinfo or has done some cherry-pick merges
>> and is now doing a "merge everything else" merge.
>
> It also happens in the "back and forth" case:

If this problem does arise in basic back-and-forth merging then I would agree it's much more important than I thought,
but I don't get it.  If "it"
refers to fragmenting the merge, then no, each merge in a back-and-forth case doesn't get fragmented.

> if the changes in the source are interspersed
> with merges from the target, i.e. the merges
> from A to B will be excluded from the changes
> to be merged from B to A.

Do you mean like this?

            A2      A3      A4
  A o-------o-------o-------o

   /         \             /

  o           \           /

   \           \         /

  B o---o-------o-------o

        B2      B3      B4  

The merge we're considering is the one shown from B4 to A4.  Changes B2 and B4 are changes in the source, interspersed with B3 which is a merge from the target, that is the result of a merge from A to B.

A2 will be chosen as the base for a 3-way merge, and the two arms of this merge will be (A2:B4) as the source side and (A2:A4) as the target side.  The source side, (A2:B4), is equivalent to the combination of B2 and B4.  It can be a bit tricky to understand why it is equivalent; if so, see <>.  The point is that changes B2 and B4 don't get split up and merged one after the other in this scenario.

>>>  This increases the probability of
>>>  merge conflicts and their likelihood to show up again
>>>  after each conflict resolution.
>> (That is: after each phase of the N revision-range phases that the merge
>> got broken into.)
>>
>>>  Moreover, it masks cases
>>>  where Symmetric Merge might select an optimized
>>>  merge plan (see wiki for various examples).
>>  I understand, but I don't know of any examples on the Wiki yet.
>
> E.g. in a merge from A to B, we can simply take the merge
> result of the last B to A merge if there were no later changes
> on either side and the B -> A merge did not cherry-pick
> (with respect to the node in question).

>>>  I propose to use the following two-phase merge workflow:
>>>
>>>  (1) Planning phase
>>>  - read / evaluate the merge info and change history
>>>     to find the list of revisions to merge
>>>  - break that down for each node changed on the source side
>>>  - optionally handle renaming etc.
>>>  - mark tree conflicts (not sure when to signal them to the
>>>     user and when to resolve them; high-level fragmentation
>>>     might become necessary)
>>>
>>>  (2) Execution phase
>>>  - merge changes on a per-node basis, i.e. all changes of
>>>     that node get merged before the next node
>>>  - fragmentation may still be necessary for *that file*
>>>  - given the "merge plan" for a node, alternative merge
>>>     strategies may be chosen
>>
>> OK, the significant change here is not so much the two-phase (although that
>> has merit in itself) but rather turning the processing order inside-out from
>> (conceptually)
>>
>>     REV_RANGES = [contiguous-revision-ranges needed for TREE]
>>     for REV_RANGE in REV_RANGES:
>>       3-way merge REV_RANGE into the whole TREE
>>
>> to (conceptually)
>>
>>     for each NODE in the TREE:
>>       MERGES = [find the best sequence of 3-way merges to merge all needed
>> changes into NODE]
>>       for MERGE in MERGES:
>>         3-way-merge MERGE into NODE
>
> Correctly. With the tiny addition that all rename
> and tree conflict handling is done in the planning
> phase. That makes it repeatable and potentially
> reduces the number of conflicts (at most one per
> node instead of one per node and merge step).

I get how it reduces conflicts.  What do you mean by 'repeatable'?

> The second phase is also reduced to purely text-
> based processing because all the "structure"
> issues have already been resolved. But that might
> be more of a mental aid instead of a true reduction
> in complexity (we still might need to handle missing
> targets on disk etc.).

>> This is a good idea in itself.  It certainly brings advantages as
>> mentioned above.  I'm not sure what, if any, disadvantages it might have.
>
> Memory consumption could be proportional to
> the number of changes and renames or such.
> Should not be a real-world problem as long as
> you don't merge more than a few 100k changes.
>
> A more severe issue might be latency because
> the planning phase needs to gather / evaluate
> data from large sections of history. But that one
> can be improved ;)

Right.

>> I don't know that the scenarios this helps with are our main concern at
>> this point.  It would be good to keep this approach in mind when writing
>> or  designing any new merge code.  In terms of development effort I think
>> it's  more valuable to concentrate first on the advantages to to-and-fro
>> merging (most  commonly without subtree merges and without cherry-picks)
>> that the  'symmetric merge' brings.
>
> I think it can become helpful as soon as we track renames because
> we need to remap on a per-node basis.

To track a renamed node, we need to be able to recognize and act on the information that each node in a merge will be found at three paths that may all be different: its path in the merge-base (aka source-left) tree, its path in the source (aka source-right) tree, and its path in the target tree.  Is that what you mean by "remap on a per-node basis"?  I think that's something we will need to do anyway, but I don't see that it makes much difference whether we do this once at the planning stage (for a two-phase merge), or do it separately for each sub-merge (for the split-into-revision-ranges method).

- Julian
Received on 2012-05-10 13:51:25 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.