[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Sharded FSFS repositories - summary

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2007-03-13 13:00:50 CET

Here's a summary of the discussions about sharding FSFS repositories,
and what I'd like to do for 1.5.0.

Generally, everyone seems to like the idea. However, no-one wanted to
be forced to pick a shard size, and it didn't look like anyone actually
wanted the size to be configurable either ("the fewer decisions that
have to be made at repository creation time the better"). Nobody liked
the configuration scheme as a way of storing the shard size.

Regarding whether it should be the default for 1.5-created filesystems:

Mattias Engdegård made the point that it's unlikely that we'd see a
difference between the two formats with a decent (tree-based) filesystem
(so there's no real reason to not go with the sharded format as the
default). Karl also wants it to be the default format.

Greg preferred not making it the default until 1.6, so that tools that
read the repository directly (e.g. previous versions of Subversion) had
time to be updated. I'm not convinced by this argument, particularly
because we already did it in 1.4 with svndiff1 support, and I didn't see
anyone complaining.

There was some discussion about whether we should consider alternate
sharding schemes to the one I originally posted (r12345 goes into
revs/N/12345 where N = 12345/constant). I'm not particular in favour of
anything more complicated unless someone can prove that it actually
makes a difference. I also like keeping the full revision name around -
it allows us to ensure that we can always uniquely identify each
revision just by the basename.

Conversion:
Greg suggested we create a tool to do an offline reorganisation of the
repository. This should actually be pretty fast.

Karl would like us to auto-upgrade repositories by default. We _can_ do
that, but only if the repository administrator is first able to exclude
all old readers (i.e. bump the format and then ensure that all older
clients have finished reading). I quite like the idea, but I'm not sure
the complexity is worthwhile if we can provide a quick offline upgrade
tool.

So, here's my plan so far. Any comments?

- For 1.5, FSFS repositories will be created as sharded by default.
  We'll bump the FSFS format number from 2 to 3 (meaning: can have
  svndiff1, sharded).

- We'll create shards of 4000 entries each. That's large enough that
  someone would have to hit 16M revisions before a larger value would be
  an improvement, but small enough that it has reasonable performance
  (and support) on all filesystems that I'm aware of. It's also a
  power-of-ten, so easier for humans to understand.

- The revision files will be named according to the scheme I posted
  earlier, and shards created on-demand.

- We'll add a --pre-1.5-compatible flag to svnadmin (and the equivalent
  to the fs-config hash), which will create FSFS filesystems with format
  2 by default, and be a no-op for BDB filesystems. --pre-1.4-compatible
  will also imply --pre-1.5-compatible.

- We'll create a tool to do an offline upgrade between formats 2 and 3
  (it'll run _much_ faster than a dump/load, and the people who'd benefit
  most from this change are also the ones that can least allow time for
  a dump/load). Since FSFS has no way to lock out readers, we'll have
  to ask the repository administrator to make sure there aren't any.

Regards,
Malcolm

  • application/pgp-signature attachment: stored
Received on Tue Mar 13 13:01:07 2007

This is an archived mail posted to the Subversion Dev mailing list.