[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn diff optimization to make blame faster?

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 19 Aug 2010 13:54:30 +0200

On Wed, Aug 18, 2010 at 3:44 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Wed, Aug 18, 2010 at 12:49 PM, Stefan Sperling <stsp_at_elego.de> wrote:
>> Can you show a profiler run that illustrates where the client is
>> spending most of its time during diff? That would probably help with
>> getting opinions from people, because it saves them from spending time
>> doing this research themselves.
>> You've already hinted at svn_diff__get_tokens() in another mail, but
>> a real profiler run would show more candidates.
>
> Sorry, I'm still very much a beginner in C (I've been programming for
> 10 years, but mainly in Java). Especially the tooling is a steep part
> of the learning curve :-), so I don't know (how to use) a profiler
> yet. Any suggestions? I'm on Windows (XP), using VCE 2008, and used
> cygwin to compare with GNU diff.

I googled around a bit for C profilers on Windows. I found several
which seem useful:
- very sleepy (http://www.codersnotes.com/sleepy/sleepy)
- Shiny (http://sourceforge.net/projects/shinyprofiler/)
- AMD CodeAnalyst (http://developer.amd.com/cpu/CodeAnalyst/Pages/default.aspx)
- AQTime - not free, but has a trial of 30 days
(http://www.automatedqa.com/products/aqtime/)

Before I dive in and try them out: any preference/favorites from the
windows devs on this list? Or other suggestions?

Further, some additional context to the "manual-profiling numbers": see below.

> For the time being, I've helped myself using "apr_time_now()" from
> apr_time.h and printf statements. Not terribly accurate and probably
> somewhat overheadish, but it gives me some hints about the
> bottlenecks.
>
> FWIW, below is the output of a recent run with my instrumented trunk
> build. I've instrumented svn_diff_diff in diff.c and
> svn_diff__get_tokens in token.c. I checked out bigfile.xml from a
> repository, and edited a single line of it in my working copy. The two
> statements "Starting diff" and "Diff finished" are the first and last
> statements inside the svn_diff_diff function. These are numbers from
> the second run (any following runs show approximately the same
> numbers).
>
> $ time svn diff bigfile.xml
> Starting diff ... (entered svn_diff_diff in diff.c)
>  -> calling svn_diff__get_tokens for svn_diff_datasource_original
>      == svn_diff__get_tokens datasource_open: 0 usec
>      == svn_diff__get_tokens while loop: 265625 usec
>         calls to datasource_get_next_token: 62500 usec
>                svn_diff__tree_insert_token: 171875 usec
>                                       rest: 15625 usec
>  <- done: 281250 usec
>  -> calling svn_diff__get_tokens for svn_diff_datasource_modified
>      == svn_diff__get_tokens datasource_open: 234375 usec
>      == svn_diff__get_tokens while loop: 312500 usec
>         calls to datasource_get_next_token: 62500 usec
>                svn_diff__tree_insert_token: 171875 usec
>                                       rest: 46875 usec
>  <- done: 562500 usec
>  -> calling svn_diff__lcs
>  <- done: 0 usec
>  -> calling svn_diff__diff
>  <- done: 0 usec
> Diff finished in 875000 usec (875 millis)
> Index: bigfile.xml
> ===================================================================
> [snip]
>
> real    0m1.266s
> user    0m0.015s
> sys     0m0.031s

For comparison: a debug build from trunk, with only one
instrumentation spot (at start and end of svn_diff_diff):
$ time svn diff bigfile.xml
Starting diff ... (entered svn_diff_diff in diff.c)
Diff finished in 703125 usec (703 millis)
[snip]
real 0m1.109s
user 0m0.015s
sys 0m0.031s

So the instrumentation in the critical loop probably costs me around
150-200 ms.

A release build will also probably be faster, but I have no time to
make one and test it now. If I time my 1.6.5 build from tigris.org,
that's still a lot faster:
$ time svn diff bigfile.xml
[snip]
real 0m0.468s
user 0m0.015s
sys 0m0.031s

Maybe that can be totally attributed to the build (release vs. debug),
or maybe 1.6.5 was faster at svn diff than trunk is?

> Some observations:
> - svn_diff__get_tokens takes most of the time
> - for some reason, in the case of datasource_modified, the call to
> datasource_open takes a long time (234 ms). In case of
> datasource_original, it's instantaneous. Maybe because of translation,
> ... of the pristine file? But I'd think that would be the original??
> - apart from that, most of the time goes to the while loop in
> svn_diff__get_tokens.
> - Inside the while loop, most time is spent on calls to
> svn_diff__tree_insert_token (which compares tokens (=lines) to insert
> them into some tree structure). Calls to datasource_get_next_token
> also take some time.
>
> I'm not too sure of the accuracy of those last numbers with my simple
> profiling method, because it's the accumulated time of calls inside a
> while loop (with 61000 iterations).
>
>
> For completeness, the same diff with /usr/bin/diff (under cygwin), of
> the edited bigfile.xml vs. the original, as of the second diff run:
>
> $ ls -l settings.xml
> -rwxr-xr-x+ 1 User None 1447693 2010-08-17 14:20 bigfile.xml
> $ time diff bigfile.xml bigfile_orig.xml
> [snip]
>
> real    0m0.078s
> user    0m0.046s
> sys     0m0.046s
>
> Cheers,
> --
> Johan

-- 
Johan
Received on 2010-08-19 13:55:25 CEST

This is an archived mail posted to the Subversion Dev mailing list.