[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Sharded FSFS repositories - summary

From: John Peacock <jpeacock_at_rowman.com>
Date: 2007-03-13 14:51:20 CET

Malcolm Rowe wrote:
> - We'll create shards of 4000 entries each. That's large enough that
> someone would have to hit 16M revisions before a larger value would be
> an improvement, but small enough that it has reasonable performance
> (and support) on all filesystems that I'm aware of. It's also a
> power-of-ten, so easier for humans to understand.

I have to say that I find "revs/N/12345 where N = 12345/constant" to be most
human unfriendly, where N isn't an actual power of 10. I can't divide large
numbers by 4000 in my head, but I could if it were 1000. I'm also concerned
about the performance characteristics of NTFS (in particular) which seems to
degrade much more quickly (to the point where I find it hard to even get a
directory of the parent folder when a child folder has thousands of entries).

I suggest that we write a quick script to generate a variety of sharding schemes
and test it on multiple filesystems, rather than just picking something out of
thin air. It may be that a multilevel system that is closer to a hashing
algorithm will be superior to any arbitrary [fixed] division.


John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Blvd
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Mar 13 14:51:15 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.