On Fri, Apr 06, 2007 at 05:49:33AM -0400, Greg Hudson wrote:
> On Fri, 2007-04-06 at 10:49 +0200, Norbert Unterberg wrote:
> > Problem:
> > FSFS can become slow because it creates too many entries in a single directory.
>
> Actually, I haven't seen any results that measure a speedup in FSFS
> performance from any kind of sharding, so this isn't necessarily a good
> problem statement.
>
That's correct. I finally got around to testing the performance
properly of a sharded vs. linear scheme on my copy of the ASF repository
this weekend, and I measured virtually no difference between the two
layouts. (I only tested read operations, but that's what we mostly care
about, given the ratio of reads to writes).
Actually, for a couple of operations, I measured a <1% slowdown by
moving to a sharded scheme; however, the majority of the results were in
the noise.
> However, listing the directory (with anything from Windows Explorer to
> "ls") or backing it up can become slow in some cases, and some
> particularly unfortunate filesystems will simply refuse to store more
> than some number of files in a directory.
>
Exactly. This is to allow Subversion to work on low-end (VFAT: max 64k
files per dir, quadratic performance past ~4k files) and higher-end (NFS
NAS boxes: typically limited directory size by default) filesystems
without fiddling, and to some extent to make life easier on the
repository administrator.
> >If I understand your solution, you violate the constraint as soon as
> >the repository reaches the revision 1,000,000 because it would create
> >the 1001st entry in db/revs/.
>
> The idea is that if you have more than a million revs, you're hopefully
> running on a good filesystem and using good tools; even if you aren't,
> whatever problem you're seeing is only 1/1000 as bad as it used to be.
>
Yup.
> Given how long it took for sharding to become enough of an itch to
> scratch in the first place, I don't think it's worth the added
> complexity to implement multi-level sharding.
>
And yup. In fact, it would probably be counter-productive. (In
microbenchmarks, I saw a slowdown in name lookup as a result of adding
additional filename components, greater than that caused by the number
of files per directory [except for VFAT, of course]).
(I committed the patch to trunk in r24576, by the way).
Regards,
Malcolm
- application/pgp-signature attachment: stored
Received on Mon Apr 16 12:01:41 2007