On Mon, Mar 05, 2007 at 12:09:01AM -0800, Karl Fogel wrote:
> Malcolm Rowe <malcolm-svn-dev@farside.org.uk> writes:
> > I've been thinking a bit recently about FSFS's scalability and
> > performance, and there are two non-backward-compatible things that I'd
> > like to be able to implement.
> >
> > The first is the ability to split your revs/ and revprops/ directories
> > into separate 'buckets' or 'shards', so that we don't require a
> > repository with a million revs to contain a million files in one
> > directory.
>
> Why would this be this non-backward-compatible? I can see why it's
> not forward-compatible, but backward-compatibility should be possible
> here.
>
Perhaps I meant forward-compatible - I always get those muddled. What I
meant is that an older client can't access a sharded repository. New
clients can access older repositories, of course.
> > local disk until it's time to do the final commit.
>
> Nice workaround!
>
Thanks! I think it's got the potential to really speed up commits
against NAS servers.
> > - Accept FSFS filesystem options at 'svnadmin create' time.
> > (perhaps in the cases above we'd name them --fsfs-max-files-per-dir=N
> > and --fsfs-local-txn-dir=/foo).
>
> Are you sure max-files-per-dir has to be configurable? If we're going
> to have the code in there anyway, would it be possible to just pick
> some reasonable values and then make this the new default for FSFS?
> (With the old format still supported, of course.) Are you sure it
> would be so much less efficient than the current storage mechanism
> that we need to retain the option of not sharding?
>
Well, I guess my woolly line of thinking goes like this:
- Most people probably don't need this feature. You'd need quite a lot
of revisions to make it worthwhile.
- I do need some way to create repositories that older clients can access.
If sharding is on by default, I could either use a
--pre-1.5-compatible switch (or --fsfs-not-sharded), or, if it's off
by default, something like --fsfs-sharded.
- I don't have the information to choose the most efficient size of
directory. I could choose a reasonable value (4096? 10000?) and
that'd be okay...
- ... but if I need per-filesystem options for the local-txn-storage
idea anyway...
- ... then I could punt on making a decision and make the user specify
the number of files is they want sharded storage...
- ... and if they don't know they want it, they don't get it, so our
repositories are compatible with 1.4 by default.
I guess the things holding me back from making it the default are:
- it's inelegant if you have a decent filesystem.
- we still need to support both methods, it's just a metter of where
the switch is located.
- I can't really set a policy for the right number of files myself.
- I quite like the idea of making it opt-in rather than opt-out, and
of reusing the options mechanism to do so.
Regards,
Malcolm
- application/pgp-signature attachment: stored
Received on Mon Mar 5 10:25:26 2007