Andreas Schweigstill wrote:
> Hello!
>
> Hyrum K. Wright schrieb:
>> Because we store lots of little files with FSFS, we take a space hit. Using an
>> example sector size of 4kB, and assuming the size of the files modulo the sector
>> size is uniformly distributed, we can probabilistically state a 2kB waste of
>> space *per revision*. For our repository, that would mean 34.5 k * 2 kB = ~70
>> MB of wasted space[1], in a ~450 MB repository, by no means trivial. By
>> smashing the little files together, we reduce this internal fragmentation in the
>> file system, and increase space efficiency. Additionally, larger files are
>> easier for the file system and operating system to handle in many cases, and may
>> help with our own internal caching.
>
> But when putting multiple revisions into one big file there occurs a new
> problem. Skript like svn-fast-backup.py depend on the current
> implementation of just one revision per file. For doing forensic
> investigations on currupted repositories and backup/restore purposes
> it is also much better to have files in the repository which will never
> change once they have be created in order to store a certain revision.
> So finally I am not sure that there will be some space saving at all if
> you also take into account that for backup purposes the "big revision
> file" will exist several times on the system. Instead hard linking to
> existing files which should never change is quite "cheap".
In response to your concerns, I should first mention that this feature is
completely optional, and has to be enabled by an administrator running 'svnadmin
pack' on the repository. If people don't want it, they don't have to use it.
We only pack completed shards, meaning that the resultant pack file is still
immutable, and won't be altered by future svn operations. Various scripts and
backup tools may need to be updated to handle the new format, just as they were
when we introduced sharding, but I don't see this as any type of show-stopper.
In fact, as Blair already pointed out, copying and moving one large file is
usually much quicker than moving many small files, which should increase the
performance of the various backup tools.
In short, I don't think the issues you mention are really that bad, and probably
arise from a misunderstanding of how packing works. (Which itself is
understandable, there's not a whole lot of documentation written about it yet!)
-Hyrum
Received on 2008-11-28 14:52:39 CET