[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Sharded FSFS repositories - summary

From: Malcolm Rowe <malcolm-svn-dev_at_farside.org.uk>
Date: 2007-03-14 10:16:29 CET

On Wed, Mar 14, 2007 at 08:54:14AM +0100, Ph. Marek wrote:
> On Tuesday 13 March 2007 15:34, Malcolm Rowe wrote:
> > But neither of those are the main reason to do this. (realistically,
> > how many times a month will a typical admin do an 'ls' in revs/ ?)
> >
> > 4000 revs is a good compromise: it's big enough that it scales to large
> > repositories (ASF's repository would be halfway towards needing another
> > level if we went with 1000 files-per-shard), and it's small enough that
> > it works everywhere we need it to (even on Coda, it seems :-)).
> Well, with 4000 you don't know where r454513 is, do you?

Yes, you do, if you use a calculator.

Again, I'm not sure exactly what use case you're thinking about that
requires admins to look for revision files so regularly :-)

> > It doesn't look like multi-level trees would be needed for performance
> > until you hit somewhere around c.100M revisions, and I'm not aware of
> > anyone who's anywhere near that level yet :-)
> As you said above, ASF (and KDE) are about to get a million revisions ... so
> with 1000 three levels would be better.
>

Well, it does obviously depend upon the number of child entries you
restrict each directory to holding. I don't yet have hard numbers
to inform me about what the typical characteristics are for various
filesystems - I'm working on that now.

> > Sure, but it's the complexity that concerns me - we really need to
> > demonstrate a tangible benefit to make it that much more complex.
> Ok.

(Karl did point out that sticking the magic number in the file makes it
possible to change it later, so that's one obvious benefit.)

> > > Have you seen my mail regarding the transaction-directories? Maybe the
> > > naming there could be done with the same function.
> > They could, but how frequently do you commit transactions with 100,000
> > changed files? Maybe on an initial import, but in that case the time
> > spent writing the data is going to dwarf the time spent looking up the
> > entries, or at least that's my intuition. You're quite welcome to
> > benchmark the difference to see what it actually is.
> I did, some time ago.
>
> As a side note: just a "dpkg-query -L <packages>" of the changed packages in
> debian-unstable (from yesterday to today) gives 2638 lines. That includes
> directories -- which are not files -- but they have properties too, like
> normal files.
> So if you dist-upgrade only once a week, you're likely to get 10 000 files
> changed.
>

This seems like rather an odd use case. But anyway, even with 10,000
files, I'm not sure that you'll see any real slowdown. Like I said, I'm
trying to quantify that now.

Regards,
Malcolm

  • application/pgp-signature attachment: stored
Received on Wed Mar 14 10:16:42 2007

This is an archived mail posted to the Subversion Dev mailing list.