On 29.09.2010 10:45, Ramkumar Ramachandra wrote:
> Hi Philip,
>
> Philip Martin writes:
>> The performance of svnrdump is likely to be dominated by IO from the
>> repository, network or disk depending on the RA layer. strace is a
>> useful tool to see opens/reads/writes. You can see what order the
>> calls occur, how many there are, how big they are and how long they
>> take.
> Ah, thanks for the tip.
My measurements seem to support what Philip wrote:
The expensive part is run on the server. Even with my
optimized server, the svnrdump CPU usage is less than
the time taken by the server. Some numbers (hot file
cache):
svnadmin dump
1.7 trunk 70s real 66s user 4s system
perf-branch 30s real 28s user 2s system
1.7 trunk svnrdump
ra-local 88s real 81s user 7s system
svn: (1.7 trunk) 99s real 6s user 4s system
svn: (perf-branch, cold) 72s real 5s user 6s system
svn: (perf-branch, hot) 17s real 5s user 5s system
Thus, svnrdump is slower only for ra-local where it is
of no particular use in the first place. To really speed
things up, the caching infrastructure from the performance
branch should be merged into /trunk.
>> Valgrind/Callgrind is good and doesn't require you to instrument the
>> code, but it does help to build with debug information. It does
>> impose a massive runtime overhead.
> I don't mind -- I'm mostly using some remote machines to gather the
> profiling data :)
>
>> This is what I get when dumping 1000 revisions from a local mirror of
>> the Subversion repository over ra_neon:
>>
>> CPU: Core 2, speed 1200 MHz (estimated)
>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
>> samples % app name symbol name
>> 4738 41.1893 no-vmlinux (no symbols)
>> 1037 9.0150 libxml2.so.2.6.32 (no symbols)
>> 700 6.0854 libneon.so.27.1.2 (no symbols)
>> 238 2.0690 libc-2.7.so _int_malloc
>> 228 1.9821 libc-2.7.so memcpy
>> 221 1.9212 libc-2.7.so memset
>> 217 1.8865 libc-2.7.so strlen
>> 191 1.6604 libsvn_subr-1.so.0.0.0 decode_bytes
>> 180 1.5648 libc-2.7.so vfprintf
>> 171 1.4866 libc-2.7.so strcmp
>> 153 1.3301 libapr-1.so.0.2.12 apr_hashfunc_default
>> 134 1.1649 libapr-1.so.0.2.12 apr_vformatter
>> 130 1.1301 libapr-1.so.0.2.12 apr_palloc
>>
>> That's on my Debian desktop. At the recent Apache Retreat I tried to
>> demonstrate OProfile on my Ubuntu laptop and could not get it to work
>> properly, probably because I forgot about -fno-omit-frame-pointer.
> Ah, now I see why it didn't work for me. The data from Callgrind is
> very interesting- it seems to suggest that APR hashtables are
> prohibitively expensive.
>
> @Stefan: Thoughts on hacking APR hashtables directly?
>
Are you sure?! Which operation is the most expensive one
and how often is it called? Who calls it and why?
>> Finally there is traditional gprof. It's a long time since I used it
>> so I don't remember the details. You instrument the code at compile
>> time using CFLAGS=-pg. If an instrumented function foo calls into a
>> library bar that is not instrumented then bar is invisible, all you
>> see is how long foo took to execute.
> Yes, I used gprof initially. Callgrind is WAY more useful.
>
At least the results are much more useful when there is
a tool like Kcachegrind that allows easy navigation though
the huge amount of information that was gathered.
-- Stefan^2.
Received on 2010-09-30 01:55:27 CEST