[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Fix O(n) runtime in cache lookup, part 1/2

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 21 Apr 2010 11:27:15 +0100

On Wed, 2010-04-21, Stefan Fuhrmann wrote:
> Philip Martin wrote:
> > Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de> writes:
> > Philip Martin wrote:
> >> Stefan Fuhrman <stefanfuhrmann_at_alice-dsl.de> writes:
> >>>> svn_cache_t lookup returns a copy of the value found.
> >>>> For objects of variable size, this can be expensive. If you
> >>>> checkout / export all N files of a specific directory, for
> >>>> instance, every single path gets verified. The directory
> >>>> information is cached for FSFS but the copy operation
> >>>> will still be O(N).
> >>>>
> >>>> As a result, the whole check takes O(N2). The good
> >>>>
> >>> That sounds bad. Do you have any measurements to show how much
> >>> improvement your patch gives?
> >>>
> >> Well, choose your N ;) For low square mean numbers of files
> >> per folder as in TSVN and SVN, the gains were between 3
> >> and 4 percent of total "svn export" runtime. If you want me to,
> >> I could run a test with 10k files in a folder (probably a realistic
> >> upper limit).
> >
> > Sounds promising. In this case I think server performance is the more
> > interesting. If you can be bothered then run a server, do a checkout
> > of a 1K or a 10k directory (probably many times) and look at CPU or
> > memory usage of the server. But if you are getting 3% on the client
> > then the server is probably getting a bigger gain.
> >
> Took me a while to get to it but here are the results.
> I created a format 5 FSFS repository with folders
> containing 2^N almost empty files. Then I ran ls and
> export on svn://localhost - with in svnserve patched
> and non-patched variants.
>
> The following table shows the "real" part of "time svn ..."
> on the client side. As far as I can tell, svnserve spawned
> a child process for each command and that child process
> was busy during the command execution (e.g. kept 1 core
> @ 100% load).
>
> Files ls-1.7 ls-patched export-1.7 export-patched
> 1 0.045s 0.046s 0.046s 0.045s
> 2 0.045s 0.045s 0.046s 0.045s
> 4 0.048s 0.045s 0.046s 0.047s
> 8 0.045s 0.044s 0.085s 0.086s
> 16 0.045s 0.046s 0.086s 0.086s
> 32 0.049s 0.048s 0.091s 0.087s
> 64 0.088s 0.089s 0.092s 0.100s
> 128 0.098s 0.094s 0.104s 0.119s
> 256 0.126s 0.107s 0.144s 0.154s
> 512 0.208s 0.132s 0.218s 0.210s
> 1024 0.350s 0.177s 0.390s 0.318s
> 2048 0.899s 0.251s 0.970s 0.458s
> 4096 3.058s 0.321s 3.413s 0.458s
> 8192 11.322s 0.601s 11.712s 0.752s
> 16384 44.282s 1.079s 44.234s 1.343s
>
> So, there is clearly a O(N^2) runtime that becomes relevant
> at a few hundred entries in a folder. For the very small file
> sizes (<20 bytes) used here, svn export is mainly limited
> by server-side path validation.

A set of measurements like that is a *really* quick and convincing
demonstration of the value of this patch. Thanks!

- Julian

> As for the patch itself, I think I will rewrite it to either allow
> for multiple pins or to follow an svn_wc__call_with_write_lock-
> like approach.
>
> -- Stefan^2.
>
Received on 2010-04-21 12:27:52 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.