Here's a summary of the discussions about sharding FSFS repositories,
and what I'd like to do for 1.5.0.
Generally, everyone seems to like the idea. However, no-one wanted to
be forced to pick a shard size, and it didn't look like anyone actually
wanted the size to be configurable either ("the fewer decisions that
have to be made at repository creation time the better"). Nobody liked
the configuration scheme as a way of storing the shard size.
Regarding whether it should be the default for 1.5-created filesystems:
Mattias Engdegård made the point that it's unlikely that we'd see a
difference between the two formats with a decent (tree-based) filesystem
(so there's no real reason to not go with the sharded format as the
default). Karl also wants it to be the default format.
Greg preferred not making it the default until 1.6, so that tools that
read the repository directly (e.g. previous versions of Subversion) had
time to be updated. I'm not convinced by this argument, particularly
because we already did it in 1.4 with svndiff1 support, and I didn't see
There was some discussion about whether we should consider alternate
sharding schemes to the one I originally posted (r12345 goes into
revs/N/12345 where N = 12345/constant). I'm not particular in favour of
anything more complicated unless someone can prove that it actually
makes a difference. I also like keeping the full revision name around -
it allows us to ensure that we can always uniquely identify each
revision just by the basename.
Greg suggested we create a tool to do an offline reorganisation of the
repository. This should actually be pretty fast.
Karl would like us to auto-upgrade repositories by default. We _can_ do
that, but only if the repository administrator is first able to exclude
all old readers (i.e. bump the format and then ensure that all older
clients have finished reading). I quite like the idea, but I'm not sure
the complexity is worthwhile if we can provide a quick offline upgrade
So, here's my plan so far. Any comments?
- For 1.5, FSFS repositories will be created as sharded by default.
We'll bump the FSFS format number from 2 to 3 (meaning: can have
- We'll create shards of 4000 entries each. That's large enough that
someone would have to hit 16M revisions before a larger value would be
an improvement, but small enough that it has reasonable performance
(and support) on all filesystems that I'm aware of. It's also a
power-of-ten, so easier for humans to understand.
- The revision files will be named according to the scheme I posted
earlier, and shards created on-demand.
- We'll add a --pre-1.5-compatible flag to svnadmin (and the equivalent
to the fs-config hash), which will create FSFS filesystems with format
2 by default, and be a no-op for BDB filesystems. --pre-1.4-compatible
will also imply --pre-1.5-compatible.
- We'll create a tool to do an offline upgrade between formats 2 and 3
(it'll run _much_ faster than a dump/load, and the people who'd benefit
most from this change are also the ones that can least allow time for
a dump/load). Since FSFS has no way to lock out readers, we'll have
to ask the repository administrator to make sure there aren't any.
Received on Tue Mar 13 13:01:07 2007
- application/pgp-signature attachment: stored