At the risk of dragging this out even further, here's another RFC.
There were a lot of comments about whether the right size for a sharded
filesystem was 1000, 4000, or whatever. Rather than rely on guesswork,
I've tried to measure the lookup time.
I created a 2GB file on my local (ext3) partition and created a variety
of filesystems on it via a loopback mount. I then mounted the new
filesystem and created 2^20 empty files on it, measuring amortised
open() time at various points. I tested using the current scheme (all
files in one directory) and the sharding-by-1000 scheme.
I tested ext2, ext3 (with and without dirindexing), reiserfs, vfat, and
hfsplus. I wasn't able to test NTFS, unfortunately.
First, the problems with this approach:
- I'm mounting the filesystem loopback on another filesystem.
- I have no data in the files, so the filesystem fits entirely into my
OS buffer cache.
- I'm only measuring average-case performance.
- With one exception, sharding is always more expensive than not
sharding (because it adds an extra directory lookup). The difference
is negligible, however, because we're still talking about microseconds
- Below 1024 files, lookup time was essentially constant. At 2048 files
and above, there was a small logarithmic increase, comparable to that
incurred by adding sharding.
- With one exception, I saw virtually no appreciable difference in
lookup time between a directory with 4096 files and one with 2^20.
That may indicate a flaw in my methodology, since it wasn't what I
- The sole exception to the above was vfat, which exhibited O(N) lookup
lookup time and broke down completely after 32,768 files. vfat had a
similar lookup time to the other filesystem up to just over 1024
files per directory.
I don't know whether my results are conclusive of anything other than
the codepath taken through the Linux filesystem drivers, but here's what
I'm going to do anyway:
> - For 1.5, FSFS repositories will be created as sharded by default.
> We'll bump the FSFS format number from 2 to 3 (meaning: can have
> svndiff1, sharded).
- We'll create shards of 1000 entries each. Anyone who has a repository
that's larger than a million revisions will likely be running on a
decent filesystem. Even if they aren't, we will still be much better
than we were before.
- We'll write out the filesystem organisation scheme to a file in the
filesystem, so that the files-per-shard (or scheme) can change in the
future. We'll read that in at filesystem open time and cache it in
the per-filesystem cached data: it'll be read at most once per process.
+ Anyone changing that file without a good understanding of what it
does will win a broken filesystem. I'm not about to protect people
from stupid administrators.
> - The revision files will be named according to the scheme I posted
> earlier, and shards created on-demand.
> - We'll add a --pre-1.5-compatible flag to svnadmin (and the equivalent
> to the fs-config hash), which will create FSFS filesystems with format
> 2 by default, and be a no-op for BDB filesystems. --pre-1.4-compatible
> will also imply --pre-1.5-compatible.
> - We'll create a tool to do an offline upgrade between formats 2 and 3
> (it'll run _much_ faster than a dump/load, and the people who'd benefit
> most from this change are also the ones that can least allow time for
> a dump/load). Since FSFS has no way to lock out readers, we'll have
> to ask the repository administrator to make sure there aren't any.
Received on Tue Mar 20 15:16:33 2007
- application/pgp-signature attachment: stored