Daniel Berlin wrote:
>I have placed a test repo dump file that should show this problem at
>http://www.toolchain.org/~dberlin/dumpfile.bz2
>
>It's a bit over 1 meg compressed
>
>svn blame on trunk/combine.c or trunk/rtl.h show the problems below
>(rtl.h is the faster of the two)
>
>
>On Wed, 2005-02-09 at 16:57 -0500, Daniel Berlin wrote:
>
>
>>So i went to determine where most of the time on gcc blames were being
>>spent.
>>
>>Out of 300 seconds taken on one file, roughly 65% of the time is being
>>spent composing deltas:
>>
>> 43.71 115.64 115.64 95953710 0.00 0.00
>>search_offset_index
>> 13.34 150.93 35.29 3119295 0.00 0.00 copy_source_ops
>> 8.17 204.50 21.62 4746 0.00 0.04
>>svn_txdelta__compose_windows
>>
>>
No, the delta combiner isn't slow. And your data seem to be skewed, as I
get similar call counts but quite different timing results (I'm using an
instrumenting profiler, not a sampling one). Running with current trunk,
with both the WC and the repo on a RAM disk, measuring only svnserve and
ignoring network I/O I get this for rtl.h:
Function Calls F% F+D%
compose_handler 13464 0.00 33.31
svn_txdelta__compose_windows 5130 3.40 33.15
compute_window 535 0.00 20.52
svn_txdelta__vdelta 535 0.00 20.52
vdelta 1070 17.78 17.80
svn_fs_bdb__string_read 17273 0.05 17.12
copy_source_ops 5091068 3.86 14.32
svn_txdelta__insert_op 13710294 10.11 11.07
apr_md5_update 12849 0.16 9.13
MD5Transform 1944240 6.79 8.97
locate_key 19953 0.15 6.48
search_offset_index 10182136 5.77 5.77
(F% is time spent in the function itself, F+D% is the time used by the
function and its children)
Note that I'm looking at code compiled with optimisation, which could be
quite important in search_offset_index, for example.
I don't think there's much that can be done for reducing computations
for a single delta combination. Oh, we could optimise away some memcpy's
and allocations, at the price of considerably more complicated
structures, but that would lop off a few percent.
The trouble lies in the number of times svn_txdelta__compose_windows is
called for a mere 536 revisions of rtl.h (the file is small enough to
fit into a single window). If I read the code correctly, that's because
the blame computation rebuilds the fulltext for each revision from
scratch, instead of cacheing the previous fulltext and constructing the
new one from it (using the delta that's often already in the
repository). That's a bit tricky in the presence of skip deltas, but the
main problem is that the FS doesn't have even a private API that would
take a revision+text and return the text for revision+1, so blame
currenly doesn't have much choice here.
If that part were optimised, that would reduce the time by about 20%. If
blame were smart enough to not recompute deltas that it can get straight
from the FS, we'd probably gain another 10%.
-- Brane
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Feb 10 06:11:01 2005