[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: crash managing a large FSFS repository

From: Simon Spero <ses_at_unc.edu>
Date: 2004-12-13 21:35:15 CET

kfogel@collab.net wrote:

>>At the moment the code uses memory roughly proportional to the total
>>lengths of all paths in the transactions.
>>
>>
>
>is both true and the cause of our problems. I'm pretty sure it's
>true, of course, it's the second half I'm not positive about :-).
>Are you sure that path lengths are relevant to total memory usage, or
>are they just lost in the noise?
>
>
   
A few rough estimators:

The original problem report was for problems importing the NetBSD source
tree, so I unpacked the files from the NetBSD 2.0 source iso.
Original report is for fewer files (~120,000) , but we're just doing
big O here.

Noise sources :
    Original report is for a memory spike from 19Mb -> 44Mb, so
results on the order of megabytes are possibly significant.
    Hashtable array size is always a power of two; hash node size is
~20 bytes.
   

First metric was to run find . >/tmp/find-netbsd.
    Total size is 8,139,654 Bytes. (wc -c)
    Number of entries: 193,716 (wc -l)
    Average path length: ~42 bytes
    Measurements were made relative to '.' ; paths in memory would be
relative to the root of the repository. Adding /trunk/ to start of each
path would use an extra 6 chars per entry (~1.1MB in this case)

Second metric is to strip out everthing but the last name component: (
sed -e 's;^.*/;;' )
    Total size: 1,654,088
    Number of entries: 193,716 (wc -l)
    Average size: 8 bytes

Third metric: size of interned path-name components (sed ... | sort | uniq)
    Total size: 577,432
    Number of unique strings: 51,236

Bonus metric: estimate entropy using bzip2 -9 -v
    Full pathnames : 0.606 bits/byte, 92.42% saved, 8139654 in, 616601 out
    Basenames: 1.221 bits/byte, 84.73% saved, 1654088 in, 252521 out.
    Interned: 2.807 bits/byte, 64.91% saved, 577432 in, 202629 out
Received on Mon Dec 13 23:12:07 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.