[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Two-phase merge

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 25 Apr 2012 13:40:29 +0100 (BST)

Stefan Fuhrmann wrote:

> This one is basically the technical post promised in my
> "Merge Policies" post. The proposal I make here is not
> an alternative to e.g. "Symmetric Merge" but rather a
> refined merge strategy that would also benefit the current
> merge.
>
> Today, the merge algorithm will do the following for a
> merge from A -> B:
>
> * read / evaluate the merge info and change history
>   to find the list of revisions to merge
> * fragment the list of revisions where they are non-
>   contiguous; fragment = a revision range to merge
> * iteratively merge all fragments from A to B,
>   optionally resolve conflicts after each fragment
>
> In end, we will always merge individual file or property
> changes but we plan the merge on tree-level. For many
> file nodes, this will unnecessary intermediate states
> where the merge got fragmented due to a change to
> some *other* file.

Yes.  That is likely to be a significant cause of extra conflict in cases where the user has sub-tree mergeinfo or has done some cherry-pick merges and is now doing a "merge everything else" merge.

> This increases the probability of
> merge conflicts and their likelihood to show up again
> after each conflict resolution.

(That is: after each phase of the N revision-range phases that the merge got broken into.)

> Moreover, it masks cases
> where Symmetric Merge might select an optimized
> merge plan (see wiki for various examples).

I understand, but I don't know of any examples on the Wiki yet.

> I propose to use the following two-phase merge workflow:
>
> (1) Planning phase
> - read / evaluate the merge info and change history
>   to find the list of revisions to merge
> - break that down for each node changed on the source side
> - optionally handle renaming etc.
> - mark tree conflicts (not sure when to signal them to the
>   user and when to resolve them; high-level fragmentation
>   might become necessary)
>
> (2) Execution phase
> - merge changes on a per-node basis, i.e. all changes of
>   that node get merged before the next node
> - fragmentation may still be necessary for *that file*
> - given the "merge plan" for a node, alternative merge
>   strategies may be chosen

OK, the significant change here is not so much the two-phase (although that has merit in itself) but rather turning the processing order inside-out from (conceptually)

  REV_RANGES = [contiguous-revision-ranges needed for TREE]
  for REV_RANGE in REV_RANGES:
    3-way merge REV_RANGE into the whole TREE

to (conceptually)

  for each NODE in the TREE:
    MERGES = [find the best sequence of 3-way merges to merge all needed changes into NODE]
    for MERGE in MERGES:
      3-way-merge MERGE into NODE

This is a good idea in itself.  It certainly brings advantages as mentioned above.  I'm not sure what, if any, disadvantages it might have.

I don't know that the scenarios this helps with are our main concern at this point.  It would be good to keep this approach in mind when writing or designing any new merge code.  In terms of development effort I think it's more valuable to concentrate first on the advantages to to-and-fro merging (most commonly without subtree merges and without cherry-picks) that the 'symmetric merge' brings.

- Julian
Received on 2012-04-25 14:41:08 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.