On 1/24/2012 23:04, Ryan Schmidt wrote:
> On Jan 24, 2012, at 15:18, The Grey Wolf wrote:
>> Hello, I'm not quite sure how to properly phrase the subject as a query
>> term, so if this has been answered, please forgive the redundancy and
>> quietly point me to where this gets addressed.
>> We are using svn at work to hold customer 'vault' data [various bits of
>> information for each customer]. It has been a huge success -- to the
>> point where we have over 1,000 customers using vaults. The checkins are
>> automated, and we have amassed over 100,000 revisions thus far.
>> User directories are created as /Ab/username [where Ab is a 2-character
>> hash via a known (balanced) algorithm to make location of username files
>> more machine-efficient]. So we have about 1,200 of these guys, with some
>> hashes obviously being re-used, no big deal.
>> The problem is that, even on miniscule changes, we are finding the
>> db/rev/<shard>/<revno> files to be disproportionately large; for an
>> addition or change of a file that is about 1k-4k, the rev files are at
>> 100K each. At lower revisions, we noticed that the rev files are 4k but
>> have been increasing in size with each shard that gets added, usually to
>> the tune of 1k/shard. With so many revisions being checked in at a rapid
>> rate, we found ourselves having to take production off line for a couple
>> of minutes while we migrated the repository in question to a larger
>> filesystem due to the threat of the filesystem filling up.
>> The upshot of this is: Why does a minimal delta create such a large
>> delta file? 100k for a small change? What's going on and how can we
>> mitigate this?
> It probably has to do with the size of the directory entries, not the
> changes you're making to the files.
> If you add a file, that's recorded as a change to the directory. When you
> change a file, Subversion stores only the changes you made, not the
> complete new file, and it stores them compressed. However, when you change
> a directory (e.g. by adding or removing a file or directory), Subversion
> records a complete new copy of the directory, and I don't know if it's
> compressed or not. If the directory has hundreds or thousands of items,
> that will take some space.
> I don't remember if modifying a file counts as a change to the directory,
> but adding or deleting a file certainly do.
> Based on this I would assume you could mitigate the problem by having fewer
> items in each directory. Create a deeper directory structure from your
> hash: /A/Ab/username, or even /A/Ab/Abc/username. You should try this out
> in a testing environment. Either create some test data, or dump your
> current repository, and then a) load it into a fresh empty repository
> as-is, and b) transform it into a deeper directory structure using a tool
> like svndumptool, then load that into a second fresh empty repository. Then
> see if there is an appreciable size difference.
Interesting, to be sure. Here's some stats.
top level = 2817 entries
second level = 1..22 entries [depending on which one]
Some have a third level, most don't; ranges 1..27 entries.
So are you saying that if I add a file /ab/username/file, it's going to copy
the ENTIRE top level directory in as a delta?
Received on 2012-01-25 09:08:02 CET