[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subdirectories for db/revs/* on fsfs backend

From: Benjamin Pflugmann <benjamin-svn-dev_at_pflugmann.de>
Date: 2004-07-21 01:04:03 CEST

On Wed 2004-07-21 at 00:05:47 +0200, Sven Mueller wrote:
> Greg Hudson [u] schrieb:
> >On Tue, 2004-07-20 at 14:34, Sven Mueller wrote:
> >
> >>Oh well. Whoever designed the fsfs backend was clearly not used to
> >>managing huge amounts of files then.
> >
> >I have it on good authority that he's used to filesystems which do their
> >job well.
>
> Wasn't meant as an accusation, but now that I read my own words again,
> it sounded like one. sorry.
> Anyway, what I really meant was: That person (you?) overlooked the
> pretty well known problems with huge directories.

No, it was not overlooked:

----------------------------------------------------------------------
On Tue, 25 May 2004 13:32:07 -0400 Greg Hudson wrote:
> On Tue, 2004-05-25 at 12:11, kfogel@collab.net wrote:
>
> > If a repository has 30,000 revisions, how does FSFS do? Is there a
> > directory somewhere that has 30,000 entries (in the regular
> > filesystem, I mean, not the versioned filesystem, of course)?
>
> Yes, you get a big directory.
>
> > Or is there some sort of subdir'ing that changes the potential O(N)
> > problem here to O(log(N)) instead?
>
> This could be implemented in a backward-compatible fashion in the
> future, so I'd like to wait until someone demonstrates a practical
> problem before adding this bit of complexity.
----------------------------------------------------------------------

You could argue it was underestimated. But then, IIRC, you said, the
problems are with third-party software rather than with Subversion
itself.

> However it is pretty difficult to change FSFS layout now without
> breaking existing installations.

Well, back then, Greg thought that this was changeable in a compatible
way, which is probably why he considered "sitting it out until
problems get reported" to be acceptable to begin with.

Dunno if he changed his mind about the feasablity of this change
staying compatible.

Bye,

        Benjamin.

PS: For curiosity, I did some measurements (on old Athlon 800 with some
20GB IDE disc, ext3, 100MBit). In a directory with 37000 empty files:

  "ls" returns at once (about 0.2s),
  "ls -l" needs 1.0s
  "ls" over NFS needs 1.3s (NFS server is on another computer)
  "ls -l" over NFS needs ~280s
          and transfers 5-6MB in each direction (according to ifconfig)
  "ls -l" over NFS for 10.000 files needs 23s and transfers 1.5MB

(everything wall time; output > /dev/null)

So the bottleneck I see with this limited test is NFS when you stat a
lot of files (no surprise) and apparently it has non-linear scaling.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 21 02:51:48 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.