Re: merge performance (was: Re: Distributed Subversion)

From: Stefan Sperling <stsp_at_elego.de>
Date: Thu, 11 Jun 2009 17:42:40 +0100

On Thu, Jun 11, 2009 at 09:04:29AM -0700, Todd C. Gleason wrote:
> > -----Original Message-----
> > From: Stefan Sperling [mailto:stsp_at_elego.de]
> > Sent: Wednesday, June 10, 2009 11:28 AM
> > To: Nathan Nobbe
> > Cc: Les Mikesell; Marko Käning; users_at_subversion.tigris.org
> > Subject: merge performance (was: Re: Distributed Subversion)
> >
> > If it is enough for most other systems to solve the simple problem
> > of merging at units of entire revisions only, then maybe that is
> > what Subversion should be doing, too. We could do away with subtree
> > mergeinfo, just have mergeinfo at branch roots and declare rX merged
> > into branch B as soon as someone merges _anything_ from rX into
> > branch B. If you want to merge more from rX, well, then undo the
> > previous merge of rX first, and then merge rX again. From conversations
> > I've had with Paul (our merge-tracking guru), it would seem that this
> > strategy would improve performance a lot. Would people like this?
>
> I just made use of this capability (for my first time) two days ago,
> because I had committed more than I wanted to merge between
> branch/trunk at the moment. So I think it's useful, and something to
> be mentioned as a superior feature of Subversion. Obviously, when you
> merge a partial revision, you want to make extra sure your merge
> results are good before committing them, but without this feature, you
> would end up doing the same thing by hand, and have to clean it up
> later when you do the actual merge.

Why isn't it enough to merge the entire revision, and leave out
bits you know you don't need yet, and declare it merged unless
the merge is reverted again?

Is it because the changes you commit to trunk aren't self-contained?
Like if you commit a bug fix and some unrelated edits in a single revision
to trunk, and then you want to backport just the bugfix into a branch?

But even in this case, why can't people live with just removing the
stuff they don't need before committing the merge result?
Why does this need to be tracked? Conflict resolution is done
during merges all the time, and isn't being tracked either,
and no one is complaining about it.

> What I would really like is a way to prevent subtree mergeinfo from
> being endlessly maintained, and maybe to clean it up by migrating it
> towards the root.

We call this "elision". It should be possible. I've been told
people are using various scripts to do it.

> For sake of discussion, assume we have a working copy directory
> structure like this:
>
> Root\L1\L2\L3\L4 (in Windows) (Root = trunk or something similar, not
> the actual root of the repo)
>
> The directory names are just abstractions, of course. I may have
> several parallel L1's, L2's, etc.
>
> Say I do a merge to L3 and commit L3. L3 now has mergeinfo. It all
> works quickly right now because L3 doesn't contain more than a few L4
> directories, and I felt more comfortable merging to L3. All the
> merged changes in this case fit under L3, so the merge results are
> effectively the same whether I merge at L3 or at the Root. The only
> difference I know of is where the mergeinfo would get recorded.

Yes.

> As time marches on, lots of other L3-level directories get mergeinfo,
> as well as L2-level folders, etc., up to the Root itself. It's a mess
> and every time a merge is done to Root, ALL the mergeinfo properties
> get updated. Why? I don't know, especially when most of them seem
> completely unrelated to any given merge. I end up committing only the
> directories containing what I think is relevant mergeinfo. And then I
> just revert the other mergeinfo changes anyway. What a pain. I'm
> fighting Subversion.

This might all go away if we stopped tracking partial merges.
Why do you really want to track them if doing so causes such headaches?

It might also go away if we fixed all the bugs in merge-tracking.

I'm not sure which is easier.

> What I would like is for the irrelevant mergeinfo not to get updated.

> Again, if Subversion just didn't always update every bit of mergeinfo
> in your entire WC, but only updated the mergeinfo for whatever got
> changed, it might solve the problem sufficiently that I don't really
> care where the mergeinfo is.
>
> Maybe there are some complex technical reasons why Subversion behaves
> as it does right now, but if so, I don't understand them, and I don't
> think most users want to understand. We simply see an ever-growing
> number of files/folders getting modified properties as a result of a
> merge, and it confuses us so we struggle to commit a sane merge
> result.

I don't fully understand all the reasons of mergeinfo creation
either, but in several cases it was found that some of it was
actually created because of bugs.

E.g. see this bugfix which will be in 1.6.3:

Author: pburba
Date: Thu Jun 4 17:28:32 2009
New Revision: 37931

Log:
Stop propagation of self-referential mergeinfo via --reintegrate merges.

While we recommend that users delete a branch once they reintegrate it, it
should still possible to keep sync merging to the branch if a --record-only
merge is performed to avoid cyclical merge problems. However there is a bug
where, even if the record only merge is performed correctly, subsequent
reintegrations can produce self-referential mergeinfo on the target. This
mergeinfo can then propagate to other branches where it can appear as
legitimate (non-self-referential) mergeinfo. This in turn can make it
appear that sync merges have been performed when they have not and
vice-versa

If you can follow the description of this problem, you get a glimpse
of how complex the merge-tracking machinery really is, and what
Paul is dealing with when finding and fixing those bugs. And yes,
there's just one guy working on fixing these problems right now,
as most people wouldn't go near it in order to keep their sanity.
I'm not suggesting that Paul is insane, he seems to be coping well :)

Given the complexity, I'm not surprised it's hard to get it working
correctly and fast at the same time.
And we've already got a lot or repositories out there which have
mergeinfo in various places, and cleaning it up is manual labour.

And how can we be sure that we've already fixed all cases were
mergeinfo created is not legit?

So people complain Subversion merges are slow, and they say that
with git/Mercurial etc. merging is fast and efficient for them.
Has anyone ever heard people complaining about git/Mercurial etc.
not tracking partial merges?

Stefan
Received on 2009-06-11 18:44:46 CEST

This message: [ Message body ]
Next message: Tyler Roscoe: "Re: .svn/entries error"
Previous message: maheshwar singh: "RE: .svn/entries error"
In reply to: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"
Next in thread: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"
Reply: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]