On Wed, Mar 14, 2007 at 08:54:14AM +0100, Ph. Marek wrote:
> On Tuesday 13 March 2007 15:34, Malcolm Rowe wrote:
> > But neither of those are the main reason to do this. (realistically,
> > how many times a month will a typical admin do an 'ls' in revs/ ?)
> > 4000 revs is a good compromise: it's big enough that it scales to large
> > repositories (ASF's repository would be halfway towards needing another
> > level if we went with 1000 files-per-shard), and it's small enough that
> > it works everywhere we need it to (even on Coda, it seems :-)).
> Well, with 4000 you don't know where r454513 is, do you?
Yes, you do, if you use a calculator.
Again, I'm not sure exactly what use case you're thinking about that
requires admins to look for revision files so regularly :-)
> > It doesn't look like multi-level trees would be needed for performance
> > until you hit somewhere around c.100M revisions, and I'm not aware of
> > anyone who's anywhere near that level yet :-)
> As you said above, ASF (and KDE) are about to get a million revisions ... so
> with 1000 three levels would be better.
Well, it does obviously depend upon the number of child entries you
restrict each directory to holding. I don't yet have hard numbers
to inform me about what the typical characteristics are for various
filesystems - I'm working on that now.
> > Sure, but it's the complexity that concerns me - we really need to
> > demonstrate a tangible benefit to make it that much more complex.
(Karl did point out that sticking the magic number in the file makes it
possible to change it later, so that's one obvious benefit.)
> > > Have you seen my mail regarding the transaction-directories? Maybe the
> > > naming there could be done with the same function.
> > They could, but how frequently do you commit transactions with 100,000
> > changed files? Maybe on an initial import, but in that case the time
> > spent writing the data is going to dwarf the time spent looking up the
> > entries, or at least that's my intuition. You're quite welcome to
> > benchmark the difference to see what it actually is.
> I did, some time ago.
> As a side note: just a "dpkg-query -L <packages>" of the changed packages in
> debian-unstable (from yesterday to today) gives 2638 lines. That includes
> directories -- which are not files -- but they have properties too, like
> normal files.
> So if you dist-upgrade only once a week, you're likely to get 10 000 files
This seems like rather an odd use case. But anyway, even with 10,000
files, I'm not sure that you'll see any real slowdown. Like I said, I'm
trying to quantify that now.
Received on Wed Mar 14 10:16:42 2007
- application/pgp-signature attachment: stored