Re: svn commit: r1127646 - in /subversion/trunk/subversion: svnrdump/load_editor.c tests/cmdline/svnrdump_tests.py

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Thu, 26 May 2011 00:36:48 +0300

I've not read the code, but would an array of 'svn_revnum_t[2]' be
a better representation?

Specifically: an append-only array of [from_rev, to_rev] pairs, sorted
by from_rev. That's less overhead (and we could take advantage of the
sorting to store less data), at the cost of O(log(n)) lookup.

Greg Stein wrote on Wed, May 25, 2011 at 17:21:44 -0400:
> On Wed, May 25, 2011 at 16:08, C. Michael Pilato <cmpilato_at_collab.net> wrote:
> > On 05/25/2011 04:05 PM, C. Michael Pilato wrote:
> >> On 05/25/2011 03:49 PM, Greg Stein wrote:
> >>> On Wed, May 25, 2011 at 15:33, <cmpilato_at_apache.org> wrote:
> >>>> ...
> >>>> + /* A mapping of svn_revnum_t * dump stream revisions to their
> >>>> + corresponding svn_revnum_t * target repository revisions. */
> >>>> + apr_hash_t *rev_map;
> >>>
> >>> How big can this grow? ie. what happens when there are several million
> >>> revisions.
> >>
> >> It gets big. (This logic and approach are copied from 'svnadmin load',
> >> which doesn't excuse it, but might explain it.)
> >
> > Actually, I don't really know for sure how big it gets. It's a mapping of
> > of sizeof(svn_revnum_t) to sizeof(svn_revnum_t), plus all the hash
> > internals. Anybody have any guesses?
>
> struct apr_hash_entry_t is generally 20 bytes. Add in the two revnums
> (4 bytes each), and you get 28 bytes for each *used* entry.
>
> Now we also have to account for unused entries. APR has a pretty poor
> hash table implementation. It allocates *upwards* to the nearest power
> of two. So the internal size will grow like:
>
> 1048576
> 2097152
> 4194304
>
> One saving grace is that APR only grows when the entry count matches
> the internal table size. It uses a "closed hash" algorithm with linked
> lists at each bucket, so the actual load on the buckets is not
> possible to compute. The hand-wave means that you can put in 4 million
> mappings before it grows it up to 8 million buckets.
>
> So... 4 million buckets (pointers) at 4 bytes each is 80 megabytes.
> Each mapping will add another 28 bytes. So: 4 million mappings is
> about 134 megabytes. But also recognize that *reaching* that point
> will use and toss approx the same amount of memory. So about 260 meg
> total.
>
> On a 64-bit architecture, all these values are likely to be doubled.
>
> Not a machine crusher, in retrospect. But not exactly a winner either.
>
> Cheers,
> -g
Received on 2011-05-25 23:37:31 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]