[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 23 Mar 2010 13:13:11 +0100

On Mon, Mar 22, 2010 at 11:03 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> On Tue, Mar 23, 2010 at 00:53, Mark Phippard <markphip_at_gmail.com> wrote:
>> On Mon, Mar 22, 2010 at 5:14 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>>> On Mon, Mar 22, 2010 at 20:37,  <hwright_at_apache.org> wrote:
>>>> +Some other random stuff Hyrum would like to talk about:
>>>> + * Why is merge slow (compared to $OTHER_SYSTEM)?
>>>> +   - Is it endemic to Subversion's architecture, or can it be fixed?
>>> My opinion that merge is slow because it's client driven. Client
>>> perform a lot of requests to decide what revisions and files to merge.
>>> Just an idea: move this logic to server side and use slightly extended
>>> reporter/editor to apply changes on client.
>>
>> Whether it is merge or blame or something else, the reason I have
>> heard given in the past is that SVN was designed this way for
>> scalability.  The server was supposed to just serve up revisions and
>> leave the more expensive parts for the client.  Given the amount of
>> RAM the client can spike to at times, I cannot see this ever scaling
>> if it were done on the server.
>>
> Scalability is a good reason to move operations to client and I
> understand how blame operation will impact server. But I don't see
> reasons why merge should take more resource than update/switch/diff
> operations. As I understand during merge we retrieve mergeinfo for
> from several locations then perform some set math on them and apply
> revisions to working tree.

I agree. I can certainly understand that general design principle, but
I think in general the answer is: it depends. Obviously it pays off
that the server does _some_ work, and doesn't shove everything off to
the client (otherwise, the server could also just stream the entire
repository to the client for every read operation, and let it sort out
for itself which revisions it needs, and what parts of it ;), then it
would hardly use any RAM on the server).

So I think that, for every use case, one needs to carefully balance
the scalability of the server against the efficiency and performance
of the operation as a whole. This will mostly depend on the amount of
memory and cpu power that is needed to do stuff on the server versus
sending stuff to the client and letting him sort it out (asking
additional stuff from the server in the process). It may actually even
be the case that the "client does most of the work" approach is more
costly for the server in the long run, because of all the extra
interactions with the client when it needs additional stuff (maybe not
in terms of maximum memory usage, but in terms of cpu, I/O with the
repos back-end, and because the operation takes a long time so some
amount of memory is tied up for a long time...).

I'm no expert on mergeinfo, but I can imagine that some (parts) of the
algorithms can be implemented quite scalably (is that a word?) on the
server. Of course, I'm only guessing here.

As for blame: sure, the current algorithm is way too heavy to put on
the server. But I'm not convinced that it has to be that way. Maybe a
faster, more efficient blame algorithm can change the equation. I
don't have enough deep knowledge about it now, so I really couldn't
say. But I don't rule it out a priori. Anyway, we'll see if we get
there (the faster algo, I mean).

Johan
Received on 2010-03-23 13:14:41 CET

This is an archived mail posted to the Subversion Dev mailing list.