[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format 6

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Mon, 21 Feb 2011 07:32:11 +0100

On 20.02.2011 09:50, Ivan Zhakov wrote:
> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eqfox_at_web.de> wrote:
>> The fopen() calls should be eliminated by the
>> file handle cache. IOW, they should already be
>> addressed on the performance branch. Please
>> let me know if that is not the case.
>>
> Just my 20 cents.
High roller.
> My belief that file handles cache should be implemented at OS level
> and I pretty sure that it's implemented.
You can certainly data to demonstrate your claim?

In fact, fopen() is extremely expensive (1..5ms) on FS with
ACLs. Even for a local, low overhead (EXT3) FS, the effect
of handle caching is significant:

time ./svnadmin verify $TSVN_MIRROR -q -F 256 -M 0
real 1m46.603s
user 1m43.474s
sys 0m3.132s

time ./svnadmin verify $TSVN_MIRROR -q -F 0 -M 0
real 2m26.664s
user 2m0.856s
sys 0m25.818s

Note that the gains are split about 50:50 between the OS
and the application. Things become even more interesting
albeit less easily demonstrable with concurrent queries
being run by a threaded server. One would expect a even
higher level of reuse.
> And right way to eliminate
> number of duplicate fopen()/reads() is improving our FS API.
Why would that be necessary if the OS already takes care
of all the optimizations?

FSFS6 is about optimizing the interface between OS and
the FSFS code: Fewer seek()s and drastically reduced
number of read()s.

Once that is in place and its behavior well understood, we
may start designing I/O aggregation and scheduling. In
particular holding off requests while another request already
fetches the desired data, will be a very interesting task

 From what I understood of the FS API there is very little
that needed to be added to allow for effective I/O optimization.
Basically, I simple "advise" or "prefetch" option on the
read functions could possibly do the trick.

If we get to that stage, I'm sure to receive "the OS should
take care of I/O scheduling and stuff" posts.
> I didn't reviewed how file handles cache is implemented in
> fs-performance branch, but I'm nearly to -1 against implementing cache
> of open file handles in Subversion.
File handle caching definitely has its drawbacks, risks
in particular. The number of file handles within an OS
instance is quite limited (typ. 1000) and open files may
prevent file deletion (e.g. during packing). The code is
supposed to take care of the latter but may be faulty.

Alternative designs are welcome.

-- Stefan^2.
Received on 2011-02-21 07:32:42 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.