[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: crash managing a large FSFS repository

From: <kfogel_at_collab.net>
Date: 2004-12-13 22:39:08 CET

Simon Spero <ses@unc.edu> writes:
> Noise sources :
> Original report is for a memory spike from 19Mb -> 44Mb, so
> results on the order of megabytes are possibly significant.
> Hashtable array size is always a power of two; hash node size is ~20
> bytes.
> First metric was to run find . >/tmp/find-netbsd. Total size is
> 8,139,654 Bytes. (wc -c)
> Number of entries: 193,716 (wc -l)
> Average path length: ~42 bytes
> Measurements were made relative to '.' ; paths in memory would be
> relative to the root of the repository. Adding /trunk/ to start of
> each path would use an extra 6 chars per entry (~1.1MB in this case)

Okay, so 193,716 * (42 + 6) == 9,298,368 == appx 9 MB

That's bad, but is it responsible for a 25 MB jump in memory usage?
Is there other data associated with each of these paths, whose size is
proportional to path depth?

> Second metric is to strip out everthing but the last name component:
> ( sed -e 's;^.*/;;' )
> Total size: 1,654,088
> Number of entries: 193,716 (wc -l)
> Average size: 8 bytes

(What was this metric for?)

This gets us 1549728 == appx 1.5 MB, much better relative to the
earlier figure, but somewhat less better relative to the mem jump
we're seeing. Hmmm. I confess I'm not sure what to think here.

> Third metric: size of interned path-name components (sed ... | sort | uniq)
> Total size: 577,432
> Number of unique strings: 51,236

(This was for estimating if we used a tree-style data structure, right?)

So, taking an average size of 8 bytes (ah, maybe that's what your
middle metric was about), we can multiply 51,236 * 8 to get 409,888.
Nice, about half a meg. Of course, we need to add in the usual tree
structure overhead, which is a whole hash-table per unique entry
except for the leaf nodes. I'm not really sure how to estimate that.
It's more than log(51,236), but less than 51,236. Plus we need a
4-byte pointer per entry...

So, is it really looking so much better than 9 MB, in the long run?

I don't mean to be reflexively skeptical, but at least this
back-of-the-envelope estimate doesn't look promising. Maybe I'm
missing something, though?


To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Dec 13 22:41:11 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.