[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: revision files absurdly large at higher revisions

From: Ryan Schmidt <subversion-2012a_at_ryandesign.com>
Date: Wed, 25 Jan 2012 01:04:57 -0600

On Jan 24, 2012, at 15:18, The Grey Wolf wrote:

> Hello, I'm not quite sure how to properly phrase the subject
> as a query term, so if this has been answered, please forgive
> the redundancy and quietly point me to where this gets addressed.
>
> We are using svn at work to hold customer 'vault' data [various bits
> of information for each customer]. It has been a huge success -- to
> the point where we have over 1,000 customers using vaults. The checkins
> are automated, and we have amassed over 100,000 revisions thus far.
>
> User directories are created as /Ab/username [where Ab is a 2-character
> hash via a known (balanced) algorithm to make location of username files more
> machine-efficient]. So we have about 1,200 of these guys, with some hashes
> obviously being re-used, no big deal.
>
> The problem is that, even on miniscule changes, we are finding the
> db/rev/<shard>/<revno> files to be disproportionately large; for an
> addition or change of a file that is about 1k-4k, the rev files are
> at 100K each. At lower revisions, we noticed that the rev files are
> 4k but have been increasing in size with each shard that gets added,
> usually to the tune of 1k/shard. With so many revisions being checked
> in at a rapid rate, we found ourselves having to take production off
> line for a couple of minutes while we migrated the repository in question
> to a larger filesystem due to the threat of the filesystem filling
> up.
>
> The upshot of this is: Why does a minimal delta create such a large
> delta file? 100k for a small change? What's going on and how can we
> mitigate this?

It probably has to do with the size of the directory entries, not the changes you're making to the files.

If you add a file, that's recorded as a change to the directory. When you change a file, Subversion stores only the changes you made, not the complete new file, and it stores them compressed. However, when you change a directory (e.g. by adding or removing a file or directory), Subversion records a complete new copy of the directory, and I don't know if it's compressed or not. If the directory has hundreds or thousands of items, that will take some space.

I don't remember if modifying a file counts as a change to the directory, but adding or deleting a file certainly do.

Based on this I would assume you could mitigate the problem by having fewer items in each directory. Create a deeper directory structure from your hash: /A/Ab/username, or even /A/Ab/Abc/username. You should try this out in a testing environment. Either create some test data, or dump your current repository, and then a) load it into a fresh empty repository as-is, and b) transform it into a deeper directory structure using a tool like svndumptool, then load that into a second fresh empty repository. Then see if there is an appreciable size difference.
Received on 2012-01-25 08:06:00 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.