[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] FSFS filesystem options (long, sorry)

From: Ph. Marek <philipp.marek_at_bmlv.gv.at>
Date: 2007-03-06 09:26:03 CET

On Monday 05 March 2007 17:23, Mattias Engdegård wrote:
> "Michael Sinz" <Michael.Sinz@sinz.org> writes:
> >> It's not a matter of storage space but performance. Shorter names means
> >> that more of them fit into a directory block on disk, more of them fit
> >> into
> >> cache, and more of them are returned with each getdents() system call.
> >
> >That would be true for which filesystems? (That is, with names this
> > short) Many of them optimize space usage granular chunks to keep things
> > on 32-bit and/or 64-bit boundaries at a minimum - if not full cache-line
> > sizes= .
>
> Most FFS-derived (including ext2/3) file systems pack entries inside
> directory blocks, because the maximum file name size is much bigger
> than the typical name. This should be true for most B-tree based file
> systems as well.
...
> That statement is not generally true. For both x86 and PowerPC, the
> penalty for unaligned accesses is generally small, and for any
> architecture definitely worth it if it can decrease the number disk
> accesses. Remember that we are not talking about data structure layout
> in memory here.
Thinking loud:
A directory entry has (approx.) 8byte inode#, 4byte flags, 4byte length, and
the name itself - for 10byte names (9 999 999 999 +\0) we get 26bytes.
Rounding to 32byte we get 128 entries per 4kByte directory block.
Going to decimal this gives 100 entries per directory.

If these are arranged in the form
  rev. 1: 00/00/1
  rev. 2: 00/00/2
  rev. 101: 00/10/1
  rev. 10001: 10/00/1
we get cache-locality, and the files are easily found per hand too.

With this in mind there should probably be an option saying *how many levels*
are wanted - which can be estimated by an administrator.
(Do you get 100 revisions max? Use none. Do you get 10 000? Use 1. More than
1M? Use 5 and you'll always be fine.)

Using that concept over-estimating doesn't hurt, because only the currently
active subset needs to be hold in memory - and that'll likely be the last
100, 200, 300 revisions, ie. 1 to 4 directory blocks.

But, as in another mail: Re-arranging afterwards could always be possible.

Well, sounds like a bikeshed .... :-)

Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Mar 6 09:26:21 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.