Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 23 Mar 2010 13:28:09 +0100

On Tue, Mar 23, 2010 at 01:13:11PM +0100, Johan Corveleyn wrote:
> On Mon, Mar 22, 2010 at 11:03 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> > On Tue, Mar 23, 2010 at 00:53, Mark Phippard <markphip_at_gmail.com> wrote:
> >> On Mon, Mar 22, 2010 at 5:14 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> >>> On Mon, Mar 22, 2010 at 20:37, <hwright_at_apache.org> wrote:
> >>>> +Some other random stuff Hyrum would like to talk about:
> >>>> + * Why is merge slow (compared to $OTHER_SYSTEM)?
> >>>> + - Is it endemic to Subversion's architecture, or can it be fixed?
> >>> My opinion that merge is slow because it's client driven. Client
> >>> perform a lot of requests to decide what revisions and files to merge.
> >>> Just an idea: move this logic to server side and use slightly extended
> >>> reporter/editor to apply changes on client.
> >>
> >> Whether it is merge or blame or something else, the reason I have
> >> heard given in the past is that SVN was designed this way for
> >> scalability. The server was supposed to just serve up revisions and
> >> leave the more expensive parts for the client. Given the amount of
> >> RAM the client can spike to at times, I cannot see this ever scaling
> >> if it were done on the server.
> >>
> > Scalability is a good reason to move operations to client and I
> > understand how blame operation will impact server. But I don't see
> > reasons why merge should take more resource than update/switch/diff
> > operations. As I understand during merge we retrieve mergeinfo for
> > from several locations then perform some set math on them and apply
> > revisions to working tree.
>
> I agree. I can certainly understand that general design principle, but
> I think in general the answer is: it depends. Obviously it pays off
> that the server does _some_ work, and doesn't shove everything off to
> the client (otherwise, the server could also just stream the entire
> repository to the client for every read operation, and let it sort out
> for itself which revisions it needs, and what parts of it ;), then it
> would hardly use any RAM on the server).
>
> So I think that, for every use case, one needs to carefully balance
> the scalability of the server against the efficiency and performance
> of the operation as a whole.

In most setups I've seen the server hardware is much beefier than
the client hardware, so unless we do things that scale really badly
(say more than O(n^2)) I don't see a problem.

It looks like we cannot avoid pushing more work on the server anyway
in the long run.
E.g. with editorv2, assuming we don't store copy-to information somehow,
the server will have to do some rename maths on revision ranges it
serves so it can tell the client whether a delete is part of a move,
and what the other half of the move is. This will have to be done on the
server if the editorv2 api stays as it currently stands (it's still being
designed).
This might involve the server having to keep track of a mapping
{deleted paths -> added paths} while driving the editor, i.e. while
a client operation like merge or update is running. But I guess we
can get that to scale well if we do it right, even for very busy
repositories.

Stefan
Received on 2010-03-23 13:29:14 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]