[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svnadmin verify performance issue (was: Re: How long do your svn dumps take)

From: Greg Stein <gstein_at_gmail.com>
Date: Fri, 24 Apr 2009 11:58:11 +0200

On Thu, Apr 23, 2009 at 21:08, Stefan Sperling <stsp_at_elego.de> wrote:
> On Thu, Apr 23, 2009 at 01:33:08PM -0500, kmradke_at_rockwellcollins.com wrote:
>> kmradke_at_rockwellcollins.com wrote on 04/22/2009 04:16:12 PM:
>> > (This appears to have bounced the first time.  Retrying...)
>> >
>> > I didn't see many options to gprof, so instead, I rebuilt subversion
>> > (and all of it's dependencies) with -pg to include profiling for
>> > all functions and re-ran the same test.  It took over 13 minutes
>> > to run this time, and looks like it includes more information.  I've
>> > attached the new gprof output.
>> >
>> > It may very well be spending a significant amount of time
>> > in ap_hash_next, since it is called over 2 billion times.
>> > apr_pstrdup is called 2.7 billion times and apr_palloc
>> > is called 2.1 billion times...
>> >
>> > This is only verifying 21 revisions, and the repo itself
>> > has over 13000...
>> >
>> > The svnadmin process didn't appear to use more than 44MB
>> > of RAM during the test, but it did use 100% of one CPU
>> > pretty much the whole time.
>>
>> Yet another datapoint, I dumped the repo, reorganized the
>> history with svndumptool to create no more than 100
>> directories per subdirectory and retested.  A full
>> "svnadmin verify" of this modified repo with the
>> same number of revisions and total files only takes
>> 6 minutes now instead of 40 hours.
>>
>> Something in "svnadmin verify" doesn't appear to like
>> large directories...
>
> I still believe in my theory, regardless of Bert's comments :)
>
> I will try to make a patch eventually if no one else gets around
> to it, and test my theory.

Note that filling a *large* hash table has O(N^2) time complexity,
which is why it gets so slow, I bet. Each time that hash table has to
grow, it needs to re-insert all the previously-inserted entries.

Unfortunately, there isn't even a way to pre-size a hash table when
you create it.

But: if you re-use a table, then it will be pre-sized. So yeah... you
may be able to fix the speed problem.
(tho I gotta believe the time-constant on that O(N^2) is pretty damned small...)

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1889210
Received on 2009-04-24 11:58:26 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.