On Mon, Mar 05, 2007 at 01:57:03PM -0500, Greg Hudson wrote:
> Ben asked me to chime in.
>
Thanks!
> On Mon, 2007-03-05 at 07:08 +0000, Malcolm Rowe wrote:
> > The first is the ability to split your revs/ and revprops/ directories
> > into separate 'buckets' or 'shards', so that we don't require a
> > repository with a million revs to contain a million files in one
> > directory.
>
> I think this is a relatively easy and quick hack, and I've commented in
> the past that I'd like to see it done at least to measure its
> performance impact. So I'd say go for it. I used to have an opinion on
> the exact format of the splitting, but I've long since forgotten what it
> is.
Yes, I'll try to measure the impact, though I doubt it'll show up for
small (< a hundred thousand revisions?) repositories, since they can
probably hold all the names in the equivalent of the dentry cache.
> Some of the subsequent discussion centered around whether we should make
> it the default (thus making 1.5 repositories unreadable by 1.4 backend
> code, and possibly breaking some hackish third-party tools which make
> assumptions about the current FSFS layout).
(Well, I'm assuming we'd bump the fs format number, so those tools
should really be checking :-))
> I'd suggest not making it
> the default in 1.5 unless it has a marked performance benefit on a
> repository of, say, 10,000 revs on a common filesystem. When 1.6 rolls
> around, making it the default would be less of an issue.
Why do you think that it'd be easier to make it the default for 1.6?
(I'm really not sure myself - I can see advantages to making it on- and
off-by-default). Because the hacky tools would have grown to understand
it?
> Note that changing from a sharded layout to a non-sharded layout does
> not necessarily require a dump and reload. We could create a tools/
> script to reorganize an existing repository as an offline (but quick)
> operation.
Sure. Offline is the key here, though, even if it is a quick operation.
(We probably _should_ write a script though).
> > For those people who are using a NAS to store the repository, FSFS
> > really really really sucks.
>
> I've had decent experience using it in AFS, which I'm guessing has a
> lower penalty for opening and closing files than NFS does (or at least,
> the implementation of NFS you're testing with).
Yes, seems to be caused by the close-to-open cache consistency required
of by NFS. It's possibly just the Linux implementation, but, well,
that's the common case where I'm looking.
> Your proposed change seems okay, but I'm not sure how successful it
> might be in a multi-vendor environment. The repository tells me to
> marshal transactions in /tmp, which works great on my Unix systems, but
> my Windows systems don't have a C:\tmp directory so it fails.
Agh, good point. People use file:// access over SMB too, and while that
wasn't the use case I was thinking of, there's no reason to make it not
work.
> Perhaps
> the option should just be a boolean "marshal transactions in temporary
> storage", so that Subversion can figure out the most appropriate
> temporary directory, worry about not stomping on other users on the same
> machine, etc.. Just an idea.
Yes, that sounds sensible. I suggested to lundblad the possibility of
putting it into <tmp>/svn.<uuid>/<txn>. I guess we'd have to make sure
we were careful about permissions, but that should work.
Regards,
Malcolm
- application/pgp-signature attachment: stored
Received on Mon Mar 5 20:15:28 2007