[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Performance regression with reverse merge

From: Paul Burba <ptburba_at_gmail.com>
Date: Mon, 9 Mar 2009 21:38:15 -0400

On Sun, Mar 8, 2009 at 12:44 PM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> Paul Burba wrote:
>> Stefan Fuhrmann wrote:
>> >> > Paul Burba wrote:
>> >> >> but it does tell
>> >> >> us something we already knew: A significant part of merge's slowdown
>> >> >> in 1.5.0+ is due to the need to walk the working copy looking for
>> >> >> explicit subtree mergeinfo.  This is not something we can skip for
>> >> >> mergeinfo aware merges, we need to know about these subtrees.  Though
>> >> >> the ongoing WCNG work will probably make it a *lot* faster.
>> >> >
>> >> > Hm. Looking at the measured data tells a different story:
>> >> > the client is responsible for less than 1% of the runtime
>> >> > (<3s of 350s total). The very constant flow of data between
>> >> > client and server is an indication for a similar situation on
>> >> > the server.
>>
>> Hi Stefan,
>>
>> When I said "make it a *lot* faster" the "it" I was referring to was
>> the time to perform the walk to find all the subtree mergeinfo.  In my
>> testing the time spent doing the walk for subtree mergeinfo was
>> approximately 3 minutes of the 9 minute merge -- hence "a significant
>> part of merge's slowdown" (at least for me! YMMV).
>
> Perhaps we are missing each others point. I don't question
> that the WC walk accounts for a major part of the run-time.
>
> However, neither disk I/O nor CPU is used more but marginally
> during that time. It is the network I/O that sees a constant trickle
> of small packages (4k in, 4k out per second). My conclusion is
> that during the WC walk, the client spends most time waiting
> for the server.
>
> Unless the number of C/S interaction is reduced by the WCNG
> design, I expect no significant performance improvement.
>
>> >> > That means, the most time is spent on the network with
>> >> > 1500 .. 2000 roundtrips (given a 187ms ping). So, you are
>> >> > right that the whole WC is crawled for mergeinfo. But the
>> >> > real problem is that for every 'relevant' node, there is an
>> >> > individual communication with the server. A faster WC
>> >> > implementation alone will have no effect here.
>>
>> Agreed, but again, for me the crawl is a significant chunk of time.
>>
>> > IMHO, there are two approaches to speeding things up:
>> >
>> > * stream / interleave C/S communication
>> >  (would serf be of any help here?)
>>
>> It seems so yes.  Could you try the merge with serf and see what you
>> find?  I saw a pretty dramatic improvement using serf over neon:
>>
>> 1.6.0.RC3.RA_NEON>C:\SVN\TSVN>timethis svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>>
>> TimeThis :  Command Line :  svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>> TimeThis :    Start Time :  Wed Mar 04 10:02:29 2009
>>
>> --- Reverse-merging r15456 into
>> 'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp':
>> U    src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp
>> --- Reverse-merging r15456 into
>> 'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h':
>> U    src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h
>>
>> TimeThis :  Command Line :  svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>> TimeThis :    Start Time :  Wed Mar 04 10:02:29 2009
>> TimeThis :      End Time :  Wed Mar 04 10:11:36 2009
>> TimeThis :  Elapsed Time :  00:09:07.140
>>
>> 1.6.0.RC3.RA_SERF>C:\SVN\TSVN>timethis svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>>
>> TimeThis :  Command Line :  svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>> TimeThis :    Start Time :  Wed Mar 04 10:12:32 2009
>>
>> --- Reverse-merging r15456 into
>> 'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp':
>> U    src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp
>> --- Reverse-merging r15456 into
>> 'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h':
>> U    src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h
>>
>> TimeThis :  Command Line :  svn merge -r 15456:15455
>> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>> TimeThis :    Start Time :  Wed Mar 04 10:12:32 2009
>> TimeThis :      End Time :  Wed Mar 04 10:16:36 2009
>> TimeThis :  Elapsed Time :  00:04:04.421
>>
>> Still pokey, but it's something.  And as I said, for me almost 3
>> minutes of this time is spent walking to target tree looking for
>> mergeinfo.  WCNG should drop that to a fraction of the time, so then
>> we could be looking at about a minute.

Doh, sorry, I was mistaken, the walk for me is closer to one minute, not three!

> Here we go (SVN 1.6.0-RC3, i686 LINUX):
>
> $time ./svn co http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk ~/TSVN -r
> 15533 --ignore-externals > /dev/null
>
>        serf            neon
> real    2m41.073s       2m15.268s
> user    0m54.107s       0m6.840s
> sys     1m25.297s       0m3.672s
>
> $time ./svn merge -r 15456:15455
> http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk ~/TSVN
>
>        serf            neon
> real    5m29.002s       6m43.831s
> user    0m1.992s        0m2.124s
> sys     0m0.824s        0m0.664s
>
> Hm. Merge is 20% faster over Serf but c/o is 20% slower.
> More importantly, c/o is CPU-bound with Serf!
> That is really unexpected.

Hmmmm, that is unexpected.

>> Beyond that, improvements are going to be harder to come by in this
>> example because http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
>> has so much explicit subtree mergeinfo, *226* paths in fact!  I assume
>> this is what you are referring to as "every 'relevant' node".  Do you
>> know how all of this mergeinfo came about?  Do you do 'subtree
>> merges'? (i.e. a merge directly targeting a subtree of a branch rather
>> than its root).
>
> TSVN developers use SVN close to HEAD. It seems
> that most of this merge info was added during 1.5
> development (around Oct 2007), i.e. the result
> of alpha-quality code.
>
> From what I can tell, most of that merge info can
> be deleted because they refer to revision that did
> not change the source node. Is it safe to do that or
> will I screw up the current merge tracking logic?

It really depends on what the subtree mergeinfo is overriding (i.e.
what would the path inherit for mergeinfo if it had no mergeinfo of
its own). Most of subtree mergeinfo on
http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk is coming from the
'/branches/1.4.x/' and '/branches/LogCacheEnhancement' branches.
Assuming you won't be merging from those branches to trunk (directly
or from copies of those branches) then I am quite confident you can
safely delete the mergeinfo from those subtrees. In fact, I think a
lot of this mergeinfo would have elided away a long time ago, but for
the fact that there are some path differences between the mergeinfo
path and the path with mergeinfo. If that didn't make sense here is
an example of what I mean:

The mergeinfo on trunk is:
  svn:mergeinfo
    /branches/1.4.x:9134
    /branches
/LogCacheEnhancement:9629-9630,9653,9691-9692,9860,9864,9867,9869-9872,9876-9880,10561-10564,10601-10706,10821-10866,10871-10887,11153,11155,11234-11241,11243,11246,11248-11252

The mergeinf on trunk/src/LogCache/Streams/BLOBInStream.h' is:
                                   ^^^^^^^
  svn:mergeinfo
    /branches/1.4.x/src/LogCache/BLOBInStream.h:9134
    /branches/LogCacheEnhancement/src/LogCache/BLOBInStream.h:9629-9630,9653,9691-9692,9860,9864,9867,9869-9872,9876-9880,10561-10564,10601-10706,10821-10866,10871-10887,11153,11155,11234-11241,11243,11246,11248-11252

But for the path difference accounted for by "Streams" the mergeinfo
on BLOBInStream.h would have elided to the root of trunk during any
merge to trunk. There are scores of other paths with explicit
mergeinfo that are in a similar situation.

>> Anyhow, I'll be looking at the code again to see where improvements can be
>> made.
>
> Thanks!

In r36444 I made a minor tweak to the merge code that potentially
avoids a network round trip for some or all of the subtrees with
explicit mergeinfo in the merge target. In your original example this
optimization is in full effect and now the merge takes just over two
minutes for me (using serf):

C:\SVN\TSVN.trunk>timethis svn merge -r 15456:15455
http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk

TimeThis : Command Line : svn merge -r 15456:15455
http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
TimeThis : Start Time : Mon Mar 09 13:59:07 2009

--- Reverse-merging r15456 into
'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp':
U src\TortoiseProc\RevisionGraph\RevisionGraphDlg.cpp
--- Reverse-merging r15456 into
'src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h':
U src\TortoiseProc\RevisionGraph\RevisionGraphDlg.h

TimeThis : Command Line : svn merge -r 15456:15455
http://tortoisesvn.tigris.org/svn/tortoisesvn/trunk
TimeThis : Start Time : Mon Mar 09 13:59:07 2009
TimeThis : End Time : Mon Mar 09 14:01:18 2009
TimeThis : Elapsed Time : 00:02:11.296

This is still slow, but is big improvement. I'm looking into further
improvements so hopefully there is more to come.

Paul

> Since larger projects will probably use local
> merges, they could also produce a larger number
> of local svn:mergeinfo.
>
> -- Stefan^2.
>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1300254
Received on 2009-03-10 03:08:08 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.