Simon Spero <ses@unc.edu> writes:
> Noise sources :
>     Original report is for  a memory spike from 19Mb ->  44Mb, so
> results on the order of megabytes are possibly significant.
> Hashtable array size is  always a power of two; hash node size is ~20
> bytes.
>    First metric was to run  find . >/tmp/find-netbsd.   Total  size is
> 8,139,654 Bytes. (wc -c)
>     Number of entries:  193,716  (wc -l)
>     Average path length: ~42 bytes
>     Measurements were made relative to '.' ;  paths in memory would be
> relative to the root of the repository.  Adding /trunk/ to start of
> each path would use an extra 6 chars per entry (~1.1MB  in this case)
Okay, so 193,716 * (42 + 6) == 9,298,368 == appx 9 MB
That's bad, but is it responsible for a 25 MB jump in memory usage?
Is there other data associated with each of these paths, whose size is
proportional to path depth?
> Second metric is to strip out everthing but the last name component:
> ( sed -e 's;^.*/;;' )
>     Total size: 1,654,088
>     Number of entries:  193,716  (wc -l)
>     Average size: 8 bytes
(What was this metric for?)
This gets us 1549728 == appx 1.5 MB, much better relative to the
earlier figure, but somewhat less better relative to the mem jump
we're seeing.  Hmmm.  I confess I'm not sure what to think here.
> Third metric: size of interned path-name components (sed ... | sort | uniq)
>     Total size: 577,432
>     Number of unique strings: 51,236
(This was for estimating if we used a tree-style data structure, right?)
So, taking an average size of 8 bytes (ah, maybe that's what your
middle metric was about), we can multiply 51,236 * 8 to get 409,888.
Nice, about half a meg.  Of course, we need to add in the usual tree
structure overhead, which is a whole hash-table per unique entry
except for the leaf nodes.  I'm not really sure how to estimate that.
It's more than log(51,236), but less than 51,236.  Plus we need a
4-byte pointer per entry...
So, is it really looking so much better than 9 MB, in the long run?
I don't mean to be reflexively skeptical, but at least this
back-of-the-envelope estimate doesn't look promising.  Maybe I'm
missing something, though?
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Dec 13 22:42:24 2004