[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn diff optimization to make blame faster?

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Thu, 19 Aug 2010 23:38:11 +0200

On Thu, Aug 19, 2010 at 1:54 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Wed, Aug 18, 2010 at 3:44 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> FWIW, below is the output of a recent run with my instrumented trunk
>> build. I've instrumented svn_diff_diff in diff.c and
>> svn_diff__get_tokens in token.c. I checked out bigfile.xml from a
>> repository, and edited a single line of it in my working copy. The two
>> statements "Starting diff" and "Diff finished" are the first and last
>> statements inside the svn_diff_diff function. These are numbers from
>> the second run (any following runs show approximately the same
>> numbers).
>>
>> $ time svn diff bigfile.xml
>> Starting diff ... (entered svn_diff_diff in diff.c)
>>  -> calling svn_diff__get_tokens for svn_diff_datasource_original
>>      == svn_diff__get_tokens datasource_open: 0 usec
>>      == svn_diff__get_tokens while loop: 265625 usec
>>         calls to datasource_get_next_token: 62500 usec
>>                svn_diff__tree_insert_token: 171875 usec
>>                                       rest: 15625 usec
>>  <- done: 281250 usec
>>  -> calling svn_diff__get_tokens for svn_diff_datasource_modified
>>      == svn_diff__get_tokens datasource_open: 234375 usec
>>      == svn_diff__get_tokens while loop: 312500 usec
>>         calls to datasource_get_next_token: 62500 usec
>>                svn_diff__tree_insert_token: 171875 usec
>>                                       rest: 46875 usec
>>  <- done: 562500 usec
>>  -> calling svn_diff__lcs
>>  <- done: 0 usec
>>  -> calling svn_diff__diff
>>  <- done: 0 usec
>> Diff finished in 875000 usec (875 millis)
>> Index: bigfile.xml
>> ===================================================================
>> [snip]
>>
>> real    0m1.266s
>> user    0m0.015s
>> sys     0m0.031s
[snip]
>
> A release build will also probably be faster, but I have no time to
> make one and test it now. If I time my 1.6.5 build from tigris.org,
> that's still a lot faster:
> $ time svn diff bigfile.xml
> [snip]
> real    0m0.468s
> user    0m0.015s
> sys     0m0.031s
>
> Maybe that can be totally attributed to the build (release vs. debug),
> or maybe 1.6.5 was faster at svn diff than trunk is?

Feh, I just redid my apr_time_now+printf profiling with a release
build (of trunk), instead of a debug build, and that makes a *big*
difference. Total time of the svn_diff_diff call is now down to ~300
ms. Still a lot slower than GNU diff (78 ms), but a lot better than
with the debug build. That will teach me not to do performance tests
with a debug build.

Still, the challenge holds: why is svn diff ~4 times slower than GNU diff?

Also, still puzzling to me: why does datasource_open take so long in
the "modified" case? Now it takes twice as long as the while loop ...

Now, I'll spend some time to do some real profiling with "very
sleepy". In the meantime, if anybody has any brilliant ideas/insight,
please let me know ...

Numbers below (release build of trunk_at_yesterday on Windows,
instrumented with apr_time_now and printf):

$ time svn diff bigfile.xml
Starting diff ... (entered svn_diff_diff in diff.c)
  -> calling svn_diff__get_tokens for svn_diff_datasource_original
      == svn_diff__get_tokens datasource_open: 0 usec
      == svn_diff__get_tokens while loop: 46875 usec
         calls to datasource_get_next_token: 15625 usec
                svn_diff__tree_insert_token: 0 usec
                                       rest: 0 usec
  <- done: 78125 usec
  -> calling svn_diff__get_tokens for svn_diff_datasource_modified
      == svn_diff__get_tokens datasource_open: 125000 usec
      == svn_diff__get_tokens while loop: 62500 usec
         calls to datasource_get_next_token: 31250 usec
                svn_diff__tree_insert_token: 31250 usec
                                       rest: 0 usec
  <- done: 218750 usec
Diff finished in 296875 usec (296 millis)
[snip]

real 0m0.531s
user 0m0.015s
sys 0m0.015s

Cheers,

-- 
Johan
Received on 2010-08-19 23:39:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.