[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: possible win32 fsfs performance

From: Mark Benedetto King <mbk_at_lowlatency.com>
Date: 2004-10-23 00:36:38 CEST

On Fri, Oct 22, 2004 at 03:28:57PM -0400, Greg Hudson wrote:
> On Fri, 2004-10-22 at 13:44, JS.staff wrote:
> > Arn't old rev files read only? Only altered when that particular revision is committed?
> >
> > I'm not sure about the 'filesystem that doesn't suck' stuff... Presumably even Linux (Peace Be Upon It) suffers from inefficiencies when storing 10,000's of small files (compared to fewer big ones).
> >
> > My idea was to make the 'jaring' an occasional svnadmin routine, not something done daily or anything. Just for very old revisions.
>
> Well, we have to be careful in case anyone is accessing the repository
> while this agglomeration is occurring. Since we can't be certain when
> the reader is done accessing a rev file, we don't know when it's safe to
> delete them. (We could use the repository layer's recovery lock, which
> is currently not used for anything, but I'm actually thinking of
> throwing that out for FSFS repositories because it means
> locking-crippled underlying filesystems can't be used for read-only
> access.)
>
> Also, later revs contain a lot of <rev, offset> pointers to earlier
> revs. Updating those pointers at collection time would be a nightmare,
> so those pointers need to remain valid. So we would need a fast way of
> mapping a rev to a collection. I'm not sure how to do that without
> imposing artificial constraints on what revs can be collected. (For
> instance, if we say that all collections must be of revs numbered
> 10*N..10*N+9, then we could quickly figure out what collection a rev
> belongs to.)
>

Also, many collection files suffer O(N) entry lookups because their
own file tables are not indexed. This can actually be significantly worse
than the filesystem's performance. Of course, we don't need to use
ZIP/JAR; we could build our own format that didn't exhibit this behaviour.

We might also be trading one problem for another. My guess is that
Filesystems That Suck Suck Thoroughly. The ones that don't like lots
of files in a directory probably also don't like big files (or they
work, but their seek performance is lousy). Aggregating revision
files together pushes us towards that dimension of Suck.

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 23 00:33:45 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.