RE: merge performance (was: Re: Distributed Subversion)

From: Todd C. Gleason <tgleason_at_impac.com>
Date: Thu, 11 Jun 2009 10:27:49 -0700

> -----Original Message-----
> From: Stefan Sperling [mailto:stsp_at_elego.de]
> Sent: Thursday, June 11, 2009 10:43 AM
> To: Gleason, Todd
> Cc: Nathan Nobbe; Les Mikesell; Marko KÃ¤ning; users_at_subversion.tigris.org
> Subject: Re: merge performance (was: Re: Distributed Subversion)
>
> Why isn't it enough to merge the entire revision, and leave out
> bits you know you don't need yet, and declare it merged unless
> the merge is reverted again?
>
> Is it because the changes you commit to trunk aren't self-contained?
> Like if you commit a bug fix and some unrelated edits in a single revision
> to trunk, and then you want to backport just the bugfix into a branch?

Yes, it's a case of hindsight being 20/20. In my case I sometimes commit something of general utility to support something in a specific branch. I want the general stuff to go into the trunk right now, thus avoiding future conflicts and enabling other developers, but also to keep working in my branch on the specifics. I usually commit these separately, but sometimes I make a mistake.

So I want to merge part of the commit now, and the rest of it later. Whatever makes this easiest for me is fine, but the ability to tell Subversion to merge just a part of the commit now seems like a good fit for what I'm doing. It seems that it is simple to check, "I merged from here to here, and that doesn't include all the changes from the revision due to the specified paths". So if I later merge the same revision with a different path, the remainder can be computed. Now, performance-wise, it might be a pain. It doesn't seem too insane on the surface, but then again I'm not Paul.

At any rate, if you change how it works, it would still be nice to let people merge part now and part later...or merge part now and revert that partial merge later...or whatever other corner cases exist. Regardless of the details of how you track it, I think it's a useful feature. Could I live without it? Probably, but I would go to merging by hand, and hoping that when I did the official merge later it didn't screw up my manual merge. Of course, I would also be less inclined to do the partial merge at all, which isn't a good thing because I might have lots of later changes that I'm now not willing to handle. And with Subversion's current ability to track the partial merge, those later merges are handled very nicely.

I discovered this by accident the other day: Merged revisions X, Y, Z, and got a lot of conflicts and tree conflicts, because X, Y, and Z were dependent upon A, B, and C. Re-did it adding in part of A, and all of B and C. The remaining conflicts now made more sense and were easy to handle, and there were no tree conflicts (files added in A, B, and C and modified in X, Y, and Z).

If I couldn't have done my partial merge of A, then I would have probably done an svn copy of the new files, and a 2-way diff of the modified files in A. Then I would have done B, C, X, Y, and Z. It would have taken a lot more time that way.

I could have done a full merge of A, too, but then I would have had to revert the remaining changes from A, and wonder what would happen if I tried to repeat the merge. Would the reverted changes get merged? Would the retained changes get merged a second time and possibly cause conflicts or bad merge results?

> But even in this case, why can't people live with just removing the
> stuff they don't need before committing the merge result?
> Why does this need to be tracked? Conflict resolution is done
> during merges all the time, and isn't being tracked either,
> and no one is complaining about it.
>
> > What I would really like is a way to prevent subtree mergeinfo from
> > being endlessly maintained, and maybe to clean it up by migrating it
> > towards the root.
>
> We call this "elision". It should be possible. I've been told
> people are using various scripts to do it.

I remember reading a bit about elision at http://subversion.tigris.org/merge-tracking/func-spec.html . It would be great to have this in place, even if I had to run some sort of "svn elide" subcommand to make it happen. In particular I would not expect for a switch or update to do it (because elision means moving the mergeinfo around, and users expect switch/update/cleanup not to mark additional files/properties as modified).

Note that my reading of elision is that this is simply about migrating merge info. And if Subversion did it, then a lot of problems would go away. But alternatively, a lot of problems would still go away if Subversion didn't update mergeinfo unnecessarily.

> > As time marches on, lots of other L3-level directories get mergeinfo,
> > as well as L2-level folders, etc., up to the Root itself. It's a mess
> > and every time a merge is done to Root, ALL the mergeinfo properties
> > get updated. Why? I don't know, especially when most of them seem
> > completely unrelated to any given merge. I end up committing only the
> > directories containing what I think is relevant mergeinfo. And then I
> > just revert the other mergeinfo changes anyway. What a pain. I'm
> > fighting Subversion.
>
> This might all go away if we stopped tracking partial merges.
> Why do you really want to track them if doing so causes such headaches?
>
> It might also go away if we fixed all the bugs in merge-tracking.
>
> I'm not sure which is easier.

It seems that the headaches are caused by lack of elision and by unnecessary updates to mergeinfo, as I described above. So, agreeing with your second paragraph, I see the headaches as being due to bugs/missing features in Subversion, not as any unavoidable consequence of partial merge tracking.

> I don't fully understand all the reasons of mergeinfo creation
> either, but in several cases it was found that some of it was
> actually created because of bugs.

I got the idea that the 1.5.x clients created a lot more mergeinfo than needed, and also couldn't handle subtree mergeinfo well at all. Upgrading to 1.6.x has alleviated this, but obviously not removed it. So it bought us some time, but if the problems continue, then we will have to look at what scripts can help us.

> Given the complexity, I'm not surprised it's hard to get it working
> correctly and fast at the same time.
> And we've already got a lot or repositories out there which have
> mergeinfo in various places, and cleaning it up is manual labour.

I don't trust myself, let alone less-knowledgeable folks, to clean up the mergeinfo properly, so I hope svn can do this semi-automatically in a near-term release, or else (as I said above) I'll have to look into scripts that do it for us. In my particular company the problem isn't critical yet, though I see how it has been for others.

> And how can we be sure that we've already fixed all cases were
> mergeinfo created is not legit?

All you can do is test, preferably by setting up a lot of known cases that were poorly handled in earlier versions of Subversion and seeing how your new code handles them.

> So people complain Subversion merges are slow, and they say that
> with git/Mercurial etc. merging is fast and efficient for them.
> Has anyone ever heard people complaining about git/Mercurial etc.
> not tracking partial merges?

I don't have experience with DVCS systems but my experience is that correctness comes first, then proper management of resources (look for and eliminate memory leaks so that operations scale well), and lastly try to optimize. I think the bulk of severe performance problems I have heard about were due to resource problems anyway, so it seems crucial to test large-scale operations and make sure memory isn't wasted or unnecessary bandwidth used.

Having less performance than a DVCS isn't going to make all your users jump ship overnight, but not being able to work on a large project at all will have a much quicker impact.

All that said, I'm interested in hearing whether DVCS users are missing this feature as well. It seems like a selling point that I didn't have any idea was relatively unique to Subversion until today. (I say "relatively" because I don't know whether ClearCase, Perforce, etc. have it.)

--Todd

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2361389

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-06-11 19:30:42 CEST

This message: [ Message body ]
Next message: Matt McHenry: "revision only visible in non-restricted logs"
Previous message: Mark Phippard: "Re: any rationale for new installs using Berkeley DB?"
In reply to: Stefan Sperling: "Re: merge performance (was: Re: Distributed Subversion)"
Next in thread: Stefan Sperling: "Re: merge performance (was: Re: Distributed Subversion)"
Reply: Stefan Sperling: "Re: merge performance (was: Re: Distributed Subversion)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]