Re: Two-phase merge

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Sun, 29 Apr 2012 15:43:42 +0200

Julian Foad wrote:
> Stefan Fuhrmann wrote:
>
>> This one is basically the technical post promised in my
>> "Merge Policies" post. The proposal I make here is not
>> an alternative to e.g. "Symmetric Merge" but rather a
>> refined merge strategy that would also benefit the current
>> merge.
>>
>> Today, the merge algorithm will do the following for a
>> merge from A -> B:
>>
>> * read / evaluate the merge info and change history
>> to find the list of revisions to merge
>> * fragment the list of revisions where they are non-
>> contiguous; fragment = a revision range to merge
>> * iteratively merge all fragments from A to B,
>> optionally resolve conflicts after each fragment
>>
>> In end, we will always merge individual file or property
>> changes but we plan the merge on tree-level. For many
>> file nodes, this will unnecessary intermediate states
>> where the merge got fragmented due to a change to
>> some *other* file.
> Yes. That is likely to be a significant cause of extra conflict in cases where the user has sub-tree mergeinfo or has done some cherry-pick merges and is now doing a "merge everything else" merge.

It also happens in the "back and forth" case:
if the changes in the source are interspersed
with merges from the target, i.e. the merges
from A to B will be excluded from the changes
to be merged from B to A.
>> This increases the probability of
>> merge conflicts and their likelihood to show up again
>> after each conflict resolution.
> (That is: after each phase of the N revision-range phases that the merge got broken into.)
>
>> Moreover, it masks cases
>> where Symmetric Merge might select an optimized
>> merge plan (see wiki for various examples).
> I understand, but I don't know of any examples on the Wiki yet.

E.g. in a merge from A to B, we can simply take the merge
result of the last B to A merge if there were no later changes
on either side and the B -> A merge did not cherry-pick
(with respect to the node in question).
>> I propose to use the following two-phase merge workflow:
>>
>> (1) Planning phase
>> - read / evaluate the merge info and change history
>> to find the list of revisions to merge
>> - break that down for each node changed on the source side
>> - optionally handle renaming etc.
>> - mark tree conflicts (not sure when to signal them to the
>> user and when to resolve them; high-level fragmentation
>> might become necessary)
>>
>> (2) Execution phase
>> - merge changes on a per-node basis, i.e. all changes of
>> that node get merged before the next node
>> - fragmentation may still be necessary for *that file*
>> - given the "merge plan" for a node, alternative merge
>> strategies may be chosen
> OK, the significant change here is not so much the two-phase (although that has merit in itself) but rather turning the processing order inside-out from (conceptually)
>
> REV_RANGES = [contiguous-revision-ranges needed for TREE]
> for REV_RANGE in REV_RANGES:
> 3-way merge REV_RANGE into the whole TREE
>
> to (conceptually)
>
> for each NODE in the TREE:
> MERGES = [find the best sequence of 3-way merges to merge all needed changes into NODE]
> for MERGE in MERGES:
> 3-way-merge MERGE into NODE

Correctly. With the tiny addition that all rename
and tree conflict handling is done in the planning
phase. That makes it repeatable and potentially
reduces the number of conflicts (at most one per
node instead of one per node and merge step).

The second phase is also reduced to purely text-
based processing because all the "structure"
issues have already been resolved. But that might
be more of a mental aid instead of a true reduction
in complexity (we still might need to handle missing
targets on disk etc.).
> This is a good idea in itself. It certainly brings advantages as mentioned above. I'm not sure what, if any, disadvantages it might have.

Memory consumption could be proportional to
the number of changes and renames or such.
Should not be a real-world problem as long as
you don't merge more than a few 100k changes.

A more severe issue might be latency because
the planning phase needs to gather / evaluate
data from large sections of history. But that one
can be improved ;)
> I don't know that the scenarios this helps with are our main concern at this point. It would be good to keep this approach in mind when writing or designing any new merge code. In terms of development effort I think it's more valuable to concentrate first on the advantages to to-and-fro merging (most commonly without subtree merges and without cherry-picks) that the 'symmetric merge' brings.

I think it can become helpful as soon as we
track renames because we need to remap
on a per-node basis.

-- Stefan^2.
Received on 2012-04-29 20:23:10 CEST

This message: [ Message body ]
Next message: Blair Zajac: "Deleting svn-fast-backup"
Previous message: Stefan Fuhrmann: "Re: Merge-relevant information that is hard to come by"
In reply to: Julian Foad: "Re: Two-phase merge"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]