[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Wed, 21 Apr 2010 03:32:53 +0200

Philip Martin wrote:
> Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de> writes:
>
>
>> Philip Martin wrote:
>>
>>> Stefan Fuhrman <stefanfuhrmann_at_alice-dsl.de> writes:
>>>
>>>
>>>
>>>> svn_cache_t lookup returns a copy of the value found.
>>>> For objects of variable size, this can be expensive. If you
>>>> checkout / export all N files of a specific directory, for
>>>> instance, every single path gets verified. The directory
>>>> information is cached for FSFS but the copy operation
>>>> will still be O(N).
>>>>
>>>> As a result, the whole check takes O(N2). The good
>>>>
>>>>
>>> That sounds bad. Do you have any measurements to show how much
>>> improvement your patch gives?
>>>
>>>
>> Well, choose your N ;) For low square mean numbers of files
>> per folder as in TSVN and SVN, the gains were between 3
>> and 4 percent of total "svn export" runtime. If you want me to,
>> I could run a test with 10k files in a folder (probably a realistic
>> upper limit).
>>
>
> Sounds promising. In this case I think server performance is the more
> interesting. If you can be bothered then run a server, do a checkout
> of a 1K or a 10k directory (probably many times) and look at CPU or
> memory usage of the server. But if you are getting 3% on the client
> then the server is probably getting a bigger gain.
>
Took me a while to get to it but here are the results.
I created a format 5 FSFS repository with folders
containing 2^N almost empty files. Then I ran ls and
export on svn://localhost - with in svnserve patched
and non-patched variants.

The following table shows the "real" part of "time svn ..."
on the client side. As far as I can tell, svnserve spawned
a child process for each command and that child process
was busy during the command execution (e.g. kept 1 core
@ 100% load).

Files ls-1.7 ls-patched export-1.7 export-patched
    1 0.045s 0.046s 0.046s 0.045s
    2 0.045s 0.045s 0.046s 0.045s
    4 0.048s 0.045s 0.046s 0.047s
    8 0.045s 0.044s 0.085s 0.086s
   16 0.045s 0.046s 0.086s 0.086s
   32 0.049s 0.048s 0.091s 0.087s
   64 0.088s 0.089s 0.092s 0.100s
  128 0.098s 0.094s 0.104s 0.119s
  256 0.126s 0.107s 0.144s 0.154s
  512 0.208s 0.132s 0.218s 0.210s
 1024 0.350s 0.177s 0.390s 0.318s
 2048 0.899s 0.251s 0.970s 0.458s
 4096 3.058s 0.321s 3.413s 0.458s
 8192 11.322s 0.601s 11.712s 0.752s
16384 44.282s 1.079s 44.234s 1.343s

So, there is clearly a O(N^2) runtime that becomes relevant
at a few hundred entries in a folder. For the very small file
sizes (<20 bytes) used here, svn export is mainly limited
by server-side path validation.

As for the patch itself, I think I will rewrite it to either allow
for multiple pins or to follow an svn_wc__call_with_write_lock-
like approach.

-- Stefan^2.
Received on 2010-04-21 03:33:18 CEST

This is an archived mail posted to the Subversion Dev mailing list.