[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format 6

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Sun, 20 Feb 2011 21:02:07 +0100

On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke <mark_at_mark.mielke.cc> wrote:
> On 02/20/2011 03:50 AM, Ivan Zhakov wrote:
>>
>> On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eqfox_at_web.de>  wrote:
>>>
>>> The fopen() calls should be eliminated by the
>>> file handle cache. IOW, they should already be
>>> addressed on the performance branch. Please
>>> let me know if that is not the case.
>>
>> My belief that file handles cache should be implemented at OS level
>> and I pretty sure that it's implemented. And right way to eliminate
>> number of duplicate fopen()/reads() is improving our FS API.
>>
>> I didn't reviewed how file handles cache is implemented in
>> fs-performance branch, but I'm nearly to -1 against implementing cache
>> of open file handles in Subversion.
>
> What OS implements file handle caching? The OS file system layer for most
> operating systems does implement caching - but open()/close() can easily
> invalidate some or all of this cache due to required POSIX behaviour,
> especially if the backend storage is remote and shared between multiple
> clients such as would be the case over NFS. This is required to implement
> consistency across clients. The local operating system cannot arbitrarily
> cache everything, and every bit of data it does decide to cache could be
> wrong at any point in time without other aspects in use such as file
> locking.
>
> Of particular concern to me is how slow Subversion gets over NFS, and this
> thread grabbed my attention as a result. When using NFS Subversion
> operations can take many times longer (20 seconds -> 20 minutes). I think
> people may be testing and making assumptions that a "local file system" will
> be in use. Do people working on the fs-performance branch check with NFS?
>
> I don't know... just dropping in... feel free to set me straight. :-)

Hi Mark,

You're absolutely right, some Subversion operations perform horribly
with FSFS over NFS (we have such a setup @work). In fact, the poor
performance of e.g. "svn log somefile" on NFS was one of the problems
I was first interested in when looking at svn (and one of the reasons
I got involved with svn development, a positive side-effect :-)).

On our setup at work, "svn log" is about 10 times slower when done
over NFS than on local disk. As I described in this thread (but also
some threads before), "svn log somefile" opens and closes each rev
file about 20 times (and the situation is not better with a packed
repository, because the packed file is opened/closed just as many
times), and it seems that is very expensive when working over NFS.

I haven't been able to test the performance branch (with the file
handle caching) on our NFS setup at work. I have only measured the
number of fopen() calls for an "svn log" operation, compared to trunk,
assuming that is *the* most critical performance differentiator for
NFS setups.

If someone could do some real measurements/benchmarks of "svn log"
(and other operations of course) of the performance branch on an NFS
setup, compared with trunk (and maybe also compare them with a similar
setup with FSFS on local disk), that could be very interesting...

> That said, I'm also (in principle) against implementing cache of open file
> handles. I prefer architectures that cache intermediate data in a processed
> form that the application has made a determined choice to make use of such
> that the cache is the most useful to the application, rather than a
> transparent caching layer that guesses at what is safe. The OS file system
> layer is exactly this - any caching it does is transparent to the
> application and a guess. Guesses are dangerous, which is exactly why the OS
> file system layer cannot do as much caching unless it has 100% control of
> the file system (= local file system).

I agree that it would be best if the architecture was so that svn
could organize its work for most use cases in a way that's efficient
for the lower levels of the system. For instance, for "svn log", svn
should in theory be able to do its work with exactly 1 open/close per
rev file (or in a packed repository, maybe even only 1 open/close per
packed file).

But right now, this isn't the case, and I think it would be a huge
amount of work, change in architecture, layering, ... Until that
happens, I think such a generic file-handle caching layer could prove
very helpful :-). Note though that, if I understood correctly, the
file-handle caching of the performance branch will not be reintegrated
into 1.7, but maybe 1.8 ...

But maybe stefan2 can comment more on that :-).

Cheers,

-- 
Johan
Received on 2011-02-20 21:03:04 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.