RE: merge performance (was: Re: Distributed Subversion)
From: Todd C. Gleason <tgleason_at_impac.com>
Date: Thu, 11 Jun 2009 10:27:49 -0700
> -----Original Message-----
Yes, it's a case of hindsight being 20/20. In my case I sometimes commit something of general utility to support something in a specific branch. I want the general stuff to go into the trunk right now, thus avoiding future conflicts and enabling other developers, but also to keep working in my branch on the specifics. I usually commit these separately, but sometimes I make a mistake.
So I want to merge part of the commit now, and the rest of it later. Whatever makes this easiest for me is fine, but the ability to tell Subversion to merge just a part of the commit now seems like a good fit for what I'm doing. It seems that it is simple to check, "I merged from here to here, and that doesn't include all the changes from the revision due to the specified paths". So if I later merge the same revision with a different path, the remainder can be computed. Now, performance-wise, it might be a pain. It doesn't seem too insane on the surface, but then again I'm not Paul.
At any rate, if you change how it works, it would still be nice to let people merge part now and part later...or merge part now and revert that partial merge later...or whatever other corner cases exist. Regardless of the details of how you track it, I think it's a useful feature. Could I live without it? Probably, but I would go to merging by hand, and hoping that when I did the official merge later it didn't screw up my manual merge. Of course, I would also be less inclined to do the partial merge at all, which isn't a good thing because I might have lots of later changes that I'm now not willing to handle. And with Subversion's current ability to track the partial merge, those later merges are handled very nicely.
I discovered this by accident the other day: Merged revisions X, Y, Z, and got a lot of conflicts and tree conflicts, because X, Y, and Z were dependent upon A, B, and C. Re-did it adding in part of A, and all of B and C. The remaining conflicts now made more sense and were easy to handle, and there were no tree conflicts (files added in A, B, and C and modified in X, Y, and Z).
If I couldn't have done my partial merge of A, then I would have probably done an svn copy of the new files, and a 2-way diff of the modified files in A. Then I would have done B, C, X, Y, and Z. It would have taken a lot more time that way.
I could have done a full merge of A, too, but then I would have had to revert the remaining changes from A, and wonder what would happen if I tried to repeat the merge. Would the reverted changes get merged? Would the retained changes get merged a second time and possibly cause conflicts or bad merge results?
> But even in this case, why can't people live with just removing the
I remember reading a bit about elision at http://subversion.tigris.org/merge-tracking/func-spec.html . It would be great to have this in place, even if I had to run some sort of "svn elide" subcommand to make it happen. In particular I would not expect for a switch or update to do it (because elision means moving the mergeinfo around, and users expect switch/update/cleanup not to mark additional files/properties as modified).
Note that my reading of elision is that this is simply about migrating merge info. And if Subversion did it, then a lot of problems would go away. But alternatively, a lot of problems would still go away if Subversion didn't update mergeinfo unnecessarily.
> > As time marches on, lots of other L3-level directories get mergeinfo,
It seems that the headaches are caused by lack of elision and by unnecessary updates to mergeinfo, as I described above. So, agreeing with your second paragraph, I see the headaches as being due to bugs/missing features in Subversion, not as any unavoidable consequence of partial merge tracking.
> I don't fully understand all the reasons of mergeinfo creation
I got the idea that the 1.5.x clients created a lot more mergeinfo than needed, and also couldn't handle subtree mergeinfo well at all. Upgrading to 1.6.x has alleviated this, but obviously not removed it. So it bought us some time, but if the problems continue, then we will have to look at what scripts can help us.
> Given the complexity, I'm not surprised it's hard to get it working
I don't trust myself, let alone less-knowledgeable folks, to clean up the mergeinfo properly, so I hope svn can do this semi-automatically in a near-term release, or else (as I said above) I'll have to look into scripts that do it for us. In my particular company the problem isn't critical yet, though I see how it has been for others.
> And how can we be sure that we've already fixed all cases were
All you can do is test, preferably by setting up a lot of known cases that were poorly handled in earlier versions of Subversion and seeing how your new code handles them.
> So people complain Subversion merges are slow, and they say that
I don't have experience with DVCS systems but my experience is that correctness comes first, then proper management of resources (look for and eliminate memory leaks so that operations scale well), and lastly try to optimize. I think the bulk of severe performance problems I have heard about were due to resource problems anyway, so it seems crucial to test large-scale operations and make sure memory isn't wasted or unnecessary bandwidth used.
Having less performance than a DVCS isn't going to make all your users jump ship overnight, but not being able to work on a large project at all will have a much quicker impact.
All that said, I'm interested in hearing whether DVCS users are missing this feature as well. It seems like a selling point that I didn't have any idea was relatively unique to Subversion until today. (I say "relatively" because I don't know whether ClearCase, Perforce, etc. have it.)
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
This is an archived mail posted to the Subversion Users mailing list.