[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: revision files absurdly large at higher revisions

From: Greywolf <greywolf_at_starwolf.com>
Date: Wed, 25 Jan 2012 00:06:52 -0800

On 1/24/2012 23:04, Ryan Schmidt wrote:
> On Jan 24, 2012, at 15:18, The Grey Wolf wrote:
>
>> Hello, I'm not quite sure how to properly phrase the subject as a query
>> term, so if this has been answered, please forgive the redundancy and
>> quietly point me to where this gets addressed.
>>
>> We are using svn at work to hold customer 'vault' data [various bits of
>> information for each customer]. It has been a huge success -- to the
>> point where we have over 1,000 customers using vaults. The checkins are
>> automated, and we have amassed over 100,000 revisions thus far.
>>
>> User directories are created as /Ab/username [where Ab is a 2-character
>> hash via a known (balanced) algorithm to make location of username files
>> more machine-efficient]. So we have about 1,200 of these guys, with some
>> hashes obviously being re-used, no big deal.
>>
>> The problem is that, even on miniscule changes, we are finding the
>> db/rev/<shard>/<revno> files to be disproportionately large; for an
>> addition or change of a file that is about 1k-4k, the rev files are at
>> 100K each. At lower revisions, we noticed that the rev files are 4k but
>> have been increasing in size with each shard that gets added, usually to
>> the tune of 1k/shard. With so many revisions being checked in at a rapid
>> rate, we found ourselves having to take production off line for a couple
>> of minutes while we migrated the repository in question to a larger
>> filesystem due to the threat of the filesystem filling up.
>>
>> The upshot of this is: Why does a minimal delta create such a large
>> delta file? 100k for a small change? What's going on and how can we
>> mitigate this?
>
> It probably has to do with the size of the directory entries, not the
> changes you're making to the files.
>
> If you add a file, that's recorded as a change to the directory. When you
> change a file, Subversion stores only the changes you made, not the
> complete new file, and it stores them compressed. However, when you change
> a directory (e.g. by adding or removing a file or directory), Subversion
> records a complete new copy of the directory, and I don't know if it's
> compressed or not. If the directory has hundreds or thousands of items,
> that will take some space.
>
> I don't remember if modifying a file counts as a change to the directory,
> but adding or deleting a file certainly do.
>
> Based on this I would assume you could mitigate the problem by having fewer
> items in each directory. Create a deeper directory structure from your
> hash: /A/Ab/username, or even /A/Ab/Abc/username. You should try this out
> in a testing environment. Either create some test data, or dump your
> current repository, and then a) load it into a fresh empty repository
> as-is, and b) transform it into a deeper directory structure using a tool
> like svndumptool, then load that into a second fresh empty repository. Then
> see if there is an appreciable size difference.

Interesting, to be sure. Here's some stats.

top level = 2817 entries
second level = 1..22 entries [depending on which one]
Some have a third level, most don't; ranges 1..27 entries.

So are you saying that if I add a file /ab/username/file, it's going to copy
the ENTIRE top level directory in as a delta?

>
>
>
>
>
>

-- 
				--*greywolf;
Received on 2012-01-25 09:08:02 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.