[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Sharded FSFS repositories

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2007-03-20 15:16:19 CET

At the risk of dragging this out even further, here's another RFC.

There were a lot of comments about whether the right size for a sharded
filesystem was 1000, 4000, or whatever. Rather than rely on guesswork,
I've tried to measure the lookup time.

I created a 2GB file on my local (ext3) partition and created a variety
of filesystems on it via a loopback mount. I then mounted the new
filesystem and created 2^20 empty files on it, measuring amortised
open() time at various points. I tested using the current scheme (all
files in one directory) and the sharding-by-1000 scheme.

I tested ext2, ext3 (with and without dirindexing), reiserfs, vfat, and
hfsplus. I wasn't able to test NTFS, unfortunately.

First, the problems with this approach:
- I'm mounting the filesystem loopback on another filesystem.
- I have no data in the files, so the filesystem fits entirely into my
  OS buffer cache.
- I'm only measuring average-case performance.

My results:
- With one exception, sharding is always more expensive than not
  sharding (because it adds an extra directory lookup). The difference
  is negligible, however, because we're still talking about microseconds
  per lookup.
- Below 1024 files, lookup time was essentially constant. At 2048 files
  and above, there was a small logarithmic increase, comparable to that
  incurred by adding sharding.
- With one exception, I saw virtually no appreciable difference in
  lookup time between a directory with 4096 files and one with 2^20.
  That may indicate a flaw in my methodology, since it wasn't what I
  expected.
- The sole exception to the above was vfat, which exhibited O(N) lookup
  lookup time and broke down completely after 32,768 files. vfat had a
  similar lookup time to the other filesystem up to just over 1024
  files per directory.

I don't know whether my results are conclusive of anything other than
the codepath taken through the Linux filesystem drivers, but here's what
I'm going to do anyway:

> - For 1.5, FSFS repositories will be created as sharded by default.
> We'll bump the FSFS format number from 2 to 3 (meaning: can have
> svndiff1, sharded).
>

- We'll create shards of 1000 entries each. Anyone who has a repository
  that's larger than a million revisions will likely be running on a
  decent filesystem. Even if they aren't, we will still be much better
  than we were before.

- We'll write out the filesystem organisation scheme to a file in the
  filesystem, so that the files-per-shard (or scheme) can change in the
  future. We'll read that in at filesystem open time and cache it in
  the per-filesystem cached data: it'll be read at most once per process.

  + Anyone changing that file without a good understanding of what it
    does will win a broken filesystem. I'm not about to protect people
    from stupid administrators.

> - The revision files will be named according to the scheme I posted
> earlier, and shards created on-demand.
>
> - We'll add a --pre-1.5-compatible flag to svnadmin (and the equivalent
> to the fs-config hash), which will create FSFS filesystems with format
> 2 by default, and be a no-op for BDB filesystems. --pre-1.4-compatible
> will also imply --pre-1.5-compatible.
>
> - We'll create a tool to do an offline upgrade between formats 2 and 3
> (it'll run _much_ faster than a dump/load, and the people who'd benefit
> most from this change are also the ones that can least allow time for
> a dump/load). Since FSFS has no way to lock out readers, we'll have
> to ask the repository administrator to make sure there aren't any.
>

Regards,
Malcolm

  • application/pgp-signature attachment: stored
Received on Tue Mar 20 15:16:33 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.