[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: xdelta, svndiff, zlib, or some other problem

From: Mark Phippard <markphip_at_gmail.com>
Date: Fri, 2 Jul 2010 07:39:20 -0400

There is a common problem people have where they get weird performance
spikes like this. It is caused by the server not having enough
entropy and some code on the server that generates a random number
takes forever.

Go here: http://svn.haxx.se/ and search for entropy to read all the threads.

It might be looking into. You can fix it easily by reconfiguring APR
to gets its random numbers from /dev/urandom instead of /dev/random.

On Thu, Jul 1, 2010 at 11:13 PM, Edward Ned Harvey <svn_at_nedharvey.com> wrote:
> I'll repost with more specifics once I have them, but for now, I'm just asking for advice on how to get better specifics.
>
> There is some sort of problem, where sometimes, a commit or other operation which should take ~10sec instead requires ~15min.  It is reproducible, but it depends on the data being committed, and currently the data being committed is private, so I can't demonstrate the problem to the outside world.
>
> I tried reproducing the problem using random data, but it didn't happen.  I tried introducing some structure to the random data, but it still didn't happen.
>
> The data in question is ~45M data files.  I have several different versions of the same file, as generated by engineers who reported the problem.  In an attempt to better understand the data structure inside the files, I did a rolling md5, of every 1M chunk of the file, and then diff'd the md5's and found that approx 1 in 20 of the 1M chunks match from version to version, so from version to version, some large sections of the file have changed, but it's not all changed.  Also, I didn't do any larger or smaller granularity than 1M chunks, so it's possible that even within a specific 1M section of the file, the data might be unchanged, or just reordered, or shifted or something like that ...  When I gzip the files, they compress to approx 20% of their original size, which means there's plenty of repeated patterns within the file, even within the 1M chunks that have changed from rev to rev.
>
> In order to reproduce the problem, I make a new repo, I do a checkout via file:///<file:///\\>, I copy rev 1 of some file to the WC, I do an add and commit.  It completes in 11sec.  I then overwrite it with rev2, commit, overwrite with rev3, etc.  After around rev10 or so, suddenly the commit takes 15minutes instead of 10sec.  I destroy my repo and WC and start all over again.  When it happens, I kill -KILL svn, do a svn cleanup, and attempt the commit again.  Once the problem situation is encountered, it doesn't go away until after a successfully completed commit.  As long as I interrupt my commit (and do a cleanup), even if I overwrite the file with various other new versions and attempt the commit, this particular rev is always stuck as a "15min" rev.
>
> In order to get a better understanding of precisely what is the problem, (and precisely what svn is doing during that time) ... svn is 100% cpu bound.  So I have taken the following strategy:
>
> (This is where the question is.)  I am asking you guys if there's any debug mode for svn, or any better way to debug.
>
> I went into subversion/svn, and I edited every single .c file.  I put a fprintf(stderr,"function name\n"); into every function, just to show me where svn is going after it's initiated.  There are a lot of files, and there are a lot of functions within those files.  The flow of the program is far from straightforward.  So far, I've put in a lot of effort, but I don't have any result.  It's bed time.  Tomorrow, unless somebody here offers me any better advice, I plan to continue sprinkling printf()'s into the svn source code, until I can find what functions or sections the process is spending all of its compute cycles in.
>
> People have suggested this is going to be xdelta.  Probably it is.  But it's not yet proven.
>
> Thanks for any tips...
>
>
>

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
Received on 2010-07-02 13:39:58 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.