Re: merge performance (was: Re: Distributed Subversion)

From: Stefan Sperling <stsp_at_elego.de>
Date: Thu, 11 Jun 2009 19:37:48 +0100

On Thu, Jun 11, 2009 at 10:27:49AM -0700, Gleason, Todd wrote:
> I discovered this by accident the other day: Merged revisions X, Y,
> Z, and got a lot of conflicts and tree conflicts, because X, Y, and Z
> were dependent upon A, B, and C. Re-did it adding in part of A, and
> all of B and C. The remaining conflicts now made more sense and were
> easy to handle, and there were no tree conflicts (files added in A, B,
> and C and modified in X, Y, and Z).
>
> If I couldn't have done my partial merge of A, then I would have
> probably done an svn copy of the new files, and a 2-way diff of the
> modified files in A. Then I would have done B, C, X, Y, and Z. It
> would have taken a lot more time that way.

That's a nice use case.

> I could have done a full merge of A, too, but then I would have had to
> revert the remaining changes from A, and wonder what would happen if I
> tried to repeat the merge. Would the reverted changes get merged?
> Would the retained changes get merged a second time and possibly cause
> conflicts or bad merge results?

If by reverting a merged revision the same state as before the
revision was merged was created, then we could simply merge the
entire revision again without problems.

But this assumption might not be true, because additional changes
might have been made to affected files since the revision was merged
for the first time. These changes themselves might conflict with
reverting the merged revision, leaving the tree in a broken state.
You'd then have to run the merge again into whatever tree you have
in front of you.

So yeah, it looks like there'd be a lot to figure out manually
in some situations. :/

> > But even in this case, why can't people live with just removing the
> > stuff they don't need before committing the merge result?
> > Why does this need to be tracked? Conflict resolution is done
> > during merges all the time, and isn't being tracked either,
> > and no one is complaining about it.
> >
> > > What I would really like is a way to prevent subtree mergeinfo from
> > > being endlessly maintained, and maybe to clean it up by migrating it
> > > towards the root.
> >
> > We call this "elision". It should be possible. I've been told
> > people are using various scripts to do it.
>
> I remember reading a bit about elision at
> http://subversion.tigris.org/merge-tracking/func-spec.html . It would
> be great to have this in place, even if I had to run some sort of "svn
> elide" subcommand to make it happen. In particular I would not expect
> for a switch or update to do it (because elision means moving the
> mergeinfo around, and users expect switch/update/cleanup not to mark
> additional files/properties as modified).

I just checked, the merge code actually does elision in certain cases.
It does not seem to be effective enough to clean up the propagation
of mergeinfo at the scale seen in the wild. Probably because the elision
code was written for simple cases, and not for cases were bugs would
cause mergeinfo to appear in places where it shouldn't be appearing.

> Note that my reading of elision is that this is simply about migrating
> merge info.

So is mine.

We could have 'svnadmin elide' which tries to elide all mergeinfo
in the repository as much as possible. This has likely been suggested
before, but I can't find an issue in the tracker.
Have to look harder, or ask Paul.

> > > As time marches on, lots of other L3-level directories get mergeinfo,
> > > as well as L2-level folders, etc., up to the Root itself. It's a mess
> > > and every time a merge is done to Root, ALL the mergeinfo properties
> > > get updated. Why? I don't know, especially when most of them seem
> > > completely unrelated to any given merge. I end up committing only the
> > > directories containing what I think is relevant mergeinfo. And then I
> > > just revert the other mergeinfo changes anyway. What a pain. I'm
> > > fighting Subversion.
> >
> > This might all go away if we stopped tracking partial merges.
> > Why do you really want to track them if doing so causes such headaches?
> >
> > It might also go away if we fixed all the bugs in merge-tracking.
> >
> > I'm not sure which is easier.
>
> It seems that the headaches are caused by lack of elision and by
> unnecessary updates to mergeinfo, as I described above. So, agreeing
> with your second paragraph, I see the headaches as being due to
> bugs/missing features in Subversion, not as any unavoidable
> consequence of partial merge tracking.

It's certainly not unavoidable.
The question is how hard it is to avoid entirely :)

> > I don't fully understand all the reasons of mergeinfo creation
> > either, but in several cases it was found that some of it was
> > actually created because of bugs.
>
> I got the idea that the 1.5.x clients created a lot more mergeinfo
> than needed, and also couldn't handle subtree mergeinfo well at all.
> Upgrading to 1.6.x has alleviated this, but obviously not removed it.

This is correct.

> > And how can we be sure that we've already fixed all cases were
> > mergeinfo created is not legit?
>
> All you can do is test, preferably by setting up a lot of known cases
> that were poorly handled in earlier versions of Subversion and seeing
> how your new code handles them.

Paul is very thorough about adding regression tests for the things
he fixes. The test script containing the merge regression tests
is among those that take the longest time to complete to run.

> > So people complain Subversion merges are slow, and they say that
> > with git/Mercurial etc. merging is fast and efficient for them.
> > Has anyone ever heard people complaining about git/Mercurial etc.
> > not tracking partial merges?
>
> I don't have experience with DVCS systems but my experience is that
> correctness comes first, then proper management of resources (look for
> and eliminate memory leaks so that operations scale well), and lastly
> try to optimize. I think the bulk of severe performance problems I
> have heard about were due to resource problems anyway, so it seems
> crucial to test large-scale operations and make sure memory isn't
> wasted or unnecessary bandwidth used.

Yes, we need to get better at memory usage. There are some nice
memory usage fixes going into 1.6.3, but there is definitely
more that we could do about it.

> Having less performance than a DVCS isn't going to make all your users
> jump ship overnight, but not being able to work on a large project at
> all will have a much quicker impact.
>
> All that said, I'm interested in hearing whether DVCS users are
> missing this feature as well.

Me too!

Thanks for your comments, they were very helpful,
Stefan
Received on 2009-06-11 20:38:53 CEST

This message: [ Message body ]
Next message: Ryan Schmidt: "Re: Hooks don't work via Http-Access"
Previous message: Matt McHenry: "revision only visible in non-restricted logs"
In reply to: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"
Next in thread: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"
Reply: Todd C. Gleason: "RE: merge performance (was: Re: Distributed Subversion)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]