[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: It's time to fix Subversion Merge

From: Andy Singleton <andy_at_assembla.com>
Date: Mon, 11 Jul 2011 13:57:58 -0400

  I received a lot of good comments, and I will batch up my responses in
this note.

 From Stefan, essentially "Can you improve the existing merge"? Yes, I
think that we can start with the existing merge code.

However, I also think that any implementation that uses subtree
merginfo, and does not have extensible merginfo, is doomed. Too much
effort goes into fixing up the subtree merge feature, and it makes the
tree change problems insoluble. So, we need to decisively cut off the
subtree options and move to a bigger and more extensible data
structure. That's why I proposed adding a new command, "newmerge". The
existing code won't be destabilized.

Paul notes that we need test cases. Yes, exactly. The first step in
this project is to make some test cases, and see how they perform with
the existing merge, and describe what users report as the problem with
these cases. This will settle the debate about whether the existing
merge is good enough. We can classify an alternate merge implementation
according to how many additional cases it handles correctly. I think a
test cases is more than a patch. It is a series of commit and merge
operations.

Mark and C. Micheal Plato raise the most serious issue. Subversion
merge problems come from the core architecture and have persisted over
many years. A complete fix may require a more radical change. And, it
is possible that SVN needs a bigger redesign even to meet the goals I
put out today. You have more experience with that than I do. We will
see. At this point, I think that merge can be significantly improved
for the existing server architecture.

Yes, the "cyclic merge" problem is a big one, and along with the tree
change problem, it accounts for most of the frustrating behavior of
Subversion merge - http://subversion.tigris.org/issues/show_bug.cgi?id=2837

I believe that cyclic merges can be handled with a bigger merge_history
/ merginfo file. When you do a merge, you make some edits to resolve
problems. Then, you commit the changes - all of the merged changesets,
plus the edits. You also write the instructions for resolving this
merge into the merge_history / merginfo file. The next time you go to
do a merge, you can replay any of the changes that you need. The new
merge_history will be a big file with a complete history.

This won't be a simple implementation, but the inside of a merge is
never simple. We need to add intelligence to the merge so that it looks
simple to the user. This intelligence can be incrementally improved
through test cases and the open source process.

New architecture might be required for handling moved and renamed
paths. This is a problem that comes up frequently in merges. However,
it also comes up in normal updates. From a merge point of view, moved
files should actually move and drag their changes with them, rather than
appear as new files with copy+delete.
* After we map to new files (manually, or with an algorithm) in an
update or a merge, we should remember the change in the merge_history.
That's why we make the history extensible.
* To automate this process, I think that moved files should be
identified by filename and tree structure, not by file ID. Yes, this is
a change in the way that Subversion thinks, but it is clearly a problem
that needs to be fixed. Other SCM systems like git use an algorithm
that makes a best guess on tree matches. As noted by Greg, git doesn't
do any other type of move tracking, and git merge works well.

The work noted by Stefan on truMerge is a good example of this
strategy. We can do the same thing - http://trumerge.open.collab.net/ .
I completely agree with the major points in this implementation:
1) It uses "heuristics" to map trees together
2) "All merges are done at the root of the branch" and "All merges are
complete (no merges in sparse working copies, etc.)"

You can see that getting rid of the subtree merges is a necessary and
probably sufficient step for fixing the tree change problems.

Mark asks where we get the GUID/UUID for foreign merges. It already
exists, because we have a server UUID, as Daniel wrote:
<repository_UUID-revision_number>. We just need to keep track of it.

In systems like git, if the user wants to cherrypick, the user must
enter the complete GUID/UUID. However, it is probably not relevant for
Subversion. You can only cherrypick complete commits from the source,
not from other sources. So, you can leave out the UUID and just specify
the revision number. You can get complete merge commits with this
technique. Unfortunately, you are not guaranteed to have access to
individual commits that were inside the merge. Because of this,
changesets inside merge commits will be vulnerable to "conflation", you
will have to sort through cases where you already have some but not all
of the changes that were in a merge commit you are merging, and you
won't be able to cherrypick inside the merge commit. I need to think
more about this case, and whether we should track individual commits
that were merged. That could be an extension.

On 7/11/2011 12:51 PM, C. Michael Pilato wrote:
> On 07/11/2011 11:46 AM, Andy Singleton wrote:
>> Many developers are moving from Subversion to other SCM systems that have
>> better merge capabilities. I have posted an article with a proposal to fix
>> this problem, here:
>>
>> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
> [...]
>
>> I think that we can build a newmerge prototype by stripping down the
>> existing merge to remove the subtree options, and moving to the extensible
>> merginfo format. It will be useful to get advice about this from experienced
>> team members.
> Your optimism is lovely (and welcome, even!), but I am not as convinced as
> you that the reason why Subversion's merge functionality is subpar is as
> superficial as the items you call out (and which are implied by your
> prototyping plan above).
>
> Very little (if anything) about your proposal touches on the *real*
> problems, such as Subversion's handling of moved/renamed objects, tree
> conflict detection/handling/resolution, changeset conflation caused by the
> fundamental diff+patch approach Subversion takes to merges rather than
> first-class changeset support), etc. These real problems with merging were
> documented many years before the merge tracking feature was ever conceived,
> and neither that feature nor its skin-deep-only warts you aim to address
> made a dent in solving those very real problems.
>
> I don't aim to discourage -- far from it! On the contrary, I want to
> encourage a deeper review of the situation. It's entirely possible that, in
> doing so, you will find solutions for the deeper core problems here, and
> obviously the Subversion community (devs and users alike) would love that!
>
> -- C-Mike
>
> [1] I'll grant that in your blog post, you at least acknowledge the tree
> changes problem and place great stock in your extensible merge tracking
> format toward some future solution.
>

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton
Received on 2011-07-11 19:58:44 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.