[RFC] Greatly improve merge performance, but break(?) edge cases

From: Paul Burba <ptburba_at_gmail.com>
Date: Thu, 9 Jul 2009 17:17:34 -0400

A thinly veiled plea for some eyes on issue #3443...

Currently merge performance is quite poor when merging to a target
with lots (think 100's+) of subtrees with explicit mergeinfo. One of
the main reasons for this is because the implicit mergeinfo of each
subtree is obtained from the server. This is described in detail in
issue #3443 http://subversion.tigris.org/issues/show_bug.cgi?id=3443
and http://svn.collab.net/repos/svn/branches/subtree-mergeinfo/notes/subtree-mergeinfo/the-performance-problem.txt
but the essential bit is this:

When performing a merge tracking aware merge to a target with subtrees
with explicit mergeinfo, the merge logic looks at the explicit
mergeinfo on the merge target and each subtree:

* For forward merges, any revisions from the merge source
that are already represented are *not* merged.

* For reverse merges, only revisions from the merge source
which are represented are reverse merged.

The logic also considers the implicit mergeinfo (a.k.a. natural history) of each
target and subtree in a similar way. The problem is, for each subtree this
means a call to svn_client__get_history_as_mergeinfo() and the expense of a
network round trip. The slower the network and/or the more subtrees with
mergeinfo, the slower the merge becomes. If the merge target has hundreds or
thousands of subtrees with explicit mergeinfo then even simple merges can become
excruciatingly slow.

In issue #3443 I've posted a patch in which the subtrees inherit the
implicit mergeinfo of the target root the same way a subtree without
explicit mergeinfo would inherit explicit mergeinfo from a parent.
This patch passes all of our tests and can result in dramatic merge
performance improvements - again see issue #3443 for an example.

The only problem is that this patch changes the behavior of merge when
a subtree with explicit mergeinfo has different history, e.g. a
subtree is deleted and replaced with a copy from a different branch or
from the same branch but at a different point in it's history. In
these cases revisions might be included or exlcuded for merging
differently than if we ask the repos for the actual implicit mergeinfo
(what we do today).

I'm left with two basic questions:

1) Are the cases where subtrees in a merge target have different
implicit mergeinfo than the rest of the target common use cases or
highly contrived edge cases?. I think the latter is true, certainly
the vanilla release and feature branch models aren't affected, but am
I missing something obvious? I've spent so much time close to this
that I might be missing the forest for the trees.

2) Assuming subtrees with differing implicit mergeinfo are edge cases,
is there any reason *not* to make the changes suggested by the patch
in issue #3443?

Any thoughts are appreciated,

Paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369501
Received on 2009-07-09 23:17:58 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: svn commit: r38383 - in trunk/subversion: libsvn_client tests/cmdline"
Previous message: Stefan Küng: "Re: thoughts about svnpatch"
Next in thread: Julian Foad: "Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases"
Reply: Julian Foad: "Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

[RFC] *Greatly* improve merge performance, but break(?) edge cases

[RFC] Greatly improve merge performance, but break(?) edge cases