[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1127646 - in /subversion/trunk/subversion: svnrdump/load_editor.c tests/cmdline/svnrdump_tests.py

From: Greg Stein <gstein_at_gmail.com>
Date: Wed, 25 May 2011 17:21:44 -0400

On Wed, May 25, 2011 at 16:08, C. Michael Pilato <cmpilato_at_collab.net> wrote:
> On 05/25/2011 04:05 PM, C. Michael Pilato wrote:
>> On 05/25/2011 03:49 PM, Greg Stein wrote:
>>> On Wed, May 25, 2011 at 15:33,  <cmpilato_at_apache.org> wrote:
>>>> ...
>>>> +  /* A mapping of svn_revnum_t * dump stream revisions to their
>>>> +     corresponding svn_revnum_t * target repository revisions. */
>>>> +  apr_hash_t *rev_map;
>>>
>>> How big can this grow? ie. what happens when there are several million
>>> revisions.
>>
>> It gets big.  (This logic and approach are copied from 'svnadmin load',
>> which doesn't excuse it, but might explain it.)
>
> Actually, I don't really know for sure how big it gets.  It's a mapping of
> of sizeof(svn_revnum_t) to sizeof(svn_revnum_t), plus all the hash
> internals.  Anybody have any guesses?

struct apr_hash_entry_t is generally 20 bytes. Add in the two revnums
(4 bytes each), and you get 28 bytes for each *used* entry.

Now we also have to account for unused entries. APR has a pretty poor
hash table implementation. It allocates *upwards* to the nearest power
of two. So the internal size will grow like:

1048576
2097152
4194304

One saving grace is that APR only grows when the entry count matches
the internal table size. It uses a "closed hash" algorithm with linked
lists at each bucket, so the actual load on the buckets is not
possible to compute. The hand-wave means that you can put in 4 million
mappings before it grows it up to 8 million buckets.

So... 4 million buckets (pointers) at 4 bytes each is 80 megabytes.
Each mapping will add another 28 bytes. So: 4 million mappings is
about 134 megabytes. But also recognize that *reaching* that point
will use and toss approx the same amount of memory. So about 260 meg
total.

On a 64-bit architecture, all these values are likely to be doubled.

Not a machine crusher, in retrospect. But not exactly a winner either.

Cheers,
-g
Received on 2011-05-25 23:22:12 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.