[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn back-end question

From: Steve Greenland <steveg_at_lsli.com>
Date: 2005-01-21 00:35:05 CET

On Thu, Jan 20, 2005 at 05:01:00PM -0600, Sean Laurent wrote:
> For those of us who know very little about filesystem level options, could
> someone tell us more about dir_index? Reading the tune2fs man page tells me
> that the dir_index option tells ext3 to "use hashed b-trees to speed up
> lookups in large directories."

The traditional Unixy filesystems store directories as a linear list
of files. Adding a new file add a new entry at the end of the list
(assuming no files have been deleted...). Therefore the speed of the
look up for the last file added grows linearly with the number of files,
and the average speed for a lookup is O(N). Using a btree would drop
that to O(log(N)), using a hash would be O(1), so I'm guessing that a
"hashed btree" is somewhere in-between.

> How significantly does this speed up lookups?

For big directories (1000s or 10000s of files), lots.

> Is there a penalty for lookups in small directories?

Probably, but also probably undiscernable except via micro-benchmarks. I
doubt Linus would merge anything that like this that caused a noticable
slow-down for the common-case. (But hashes and trees are generally
performance wins even for small data-set sizes; they lose only because
of complexity and memory usage issues.)

If your really care, you can search the LKML archives; here's a link to
one of Ted Tso's messages to give you a starting point:

http://lwn.net/Articles/11481/

(which notes that you don't even have to do the mkdir/copy/mv trick; you
can use tune2fs and e2fsck to convert existing directories.)

> Looking back through the Subversion archives, I only see the one discussion
> from last December where dir_index was mentioned. Has anyone done any
> testing or comparisons of FSFS with and without dir_index enabled?

IIRC, the FSFS developers said they were already doing things to
mitigate the problem, and you have to have pretty big repos with lots of
revisions to even notice.

If/when I create a new repo on an ext3 FS, I'll make sure that dir_index
is enabled, just in case. But I wouldn't bother to convert an existing
system until I saw a problem.

Steve

-- 
"Outlook not so good." That magic 8-ball knows everything! I'll ask
about Exchange Server next.
                           -- (Stolen from the net)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Jan 21 00:37:26 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.