[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: file_handle_cache branch ready for review

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Mon, 09 Jan 2012 03:12:41 +0100

On 03.01.2012 10:01, Ivan Zhakov wrote:
> On Sun, Dec 18, 2011 at 22:46, Stefan Fuhrmann
> <stefanfuhrmann_at_alice-dsl.de> wrote:
>> For interested parties,
>>
>> Work on said branch is done so far and I think
>> it is generally ready for being merged into /trunk.
>> But it is certainly a good idea to give it some
>> detailed review, in particular with regard to side-
>> effects on e.g. FSFS recovery code.
>>
>> What it does in a nutshell:
>>
>> * to FSFS svn_fs_t, attaches a cache for open file
>> handles to reduce the number of open, lseek,
>> read and close operations
>> * svnserve CL and apache config option to set
>> the per-process maximum of cached handles
>> (default: 16)
>>
>> * use cached file handles for r/o access to revision
>> files in FSFS
>> * notify the cache after revision files have been
>> written ("flush" cache entries for these files)
>>
>> With the repository on a linux RAM disk, it speeds
>> up a "cold" export of TSVN /trunk by approx. 10%.
>> Slower disks may show larger gains.
>>
>> If there are no major objections, I would like to merge
>> the code into /trunk around mid-January.
>>
> Hi, Stefan!
>
> As I stated before I'm -1 for merging file-handles cache stuff to trunk:
> 1. The performance gain is not significant and may vary from
> environment to environment.

What would constitute a significant improvement?
Keep in mind that these measurements did not
involve any physical disk access or even things
like a SAN or even repositories on NFS shares.

 From what I have seen, the patch eliminates up to
~75% of I/O activity.
> 2. We introduce really dirty hacks. After merging your branch we have
> to worry about file handles to keep them in reusable state or do not
> forget to notify file handles cache to flush this handle.

That is indeed an issue. I think I can rework the code
to disable the cache as soon as a transaction starts.
The cache is mainly effective in read requests.
> 3. By design handles are not intended to be cached, underlying objects
> should be cached.
What would those underlying objects be? File buffers,
position of the file pointer? Well, that is *exactly* what
the cache does.

In fact, most of the performance improvement stems from
intelligently aligning and re-using existing data buffers
instead of re-reading the same data over and over.

> In your branch your introduce handles to handles.
APR does the same. Is that unreasonable?
> 4. It seems current implementation reuses file handles even error is
> occurred when working with the handle. It's potentially dangerous from
> my point view.
>
Interesting point. But I think it will not cause issues in
our case because we use it only within a FSFS "session".
Any I/O error should result in a failure of the respective FS
operation, in turn freeing the whole cache instance.

The whole problem with I/O improvement is that there
is probably no other way to make progress in the next
3 years: Some of it could be addressed within APR,
but that won't be available for Apache < 2.4 (2.6?).
A better solution is FSv2 but that is at least 3 years
away from now.

-- Stefan^2.
Received on 2012-01-09 03:13:27 CET

This is an archived mail posted to the Subversion Dev mailing list.