Re: FSFS format 6

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Wed, 29 Dec 2010 20:37:58 +0100

On 29.12.2010 01:58, Johan Corveleyn wrote:
> On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
> <stefanfuhrmann_at_alice-dsl.de> wrote:
>> On 19.10.2010 15:10, Daniel Shahaf wrote:
>>> Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
>>>> Personally, I see [FSv2] as a broad swath of API changes to align our
>>>> needs with the underlying storage. Trowbridge noted that our current
>>>> API makes it *really* difficult to implement an effective backend. I'd
>>>> also like to see a backend that allows for parallel PUTs during the
>>>> commit process. Hyrum sees FSv2 as some kind of super-key-value
>>>> storage with layers on top, allowing for various types of high-scaling
>>>> mechanisms.
>>> At the retreat, stefan2 also had some thoughts about this...
>>>
>> [This is just a brain-dump for 1.8+]
>>
>> While working on the performance branch I made some
>> observations concerning the way FSFS organizes data
>> and how that could be changed for reduced I/O overhead.
>>
>> notes/fsfs-improvements.txt contains a summary of that
>> could be done to improve FSFS before FS-NG. A later
>> FS-NG implementation should then still benefit from the
>> improvements.
> +(number of fopen calls during a log operation)
>
> I like this proposal a lot. As I already told before, we are running
> our FSFS back-end on a SAN over NFS (and I suspect we're not the only
> company doing this). In this environment, the server-side I/O of SVN
> (especially the amount of random reads and fopen calls during e.g.
> log) is often the major bottleneck.
>
> There is one question going around in my head though: won't you have
> to change/rearrange a lot of the FS layer code (and maybe repos
> layer?) to benefit from this new format?
Maybe. But as far as I understand the current
FSFS structure, data access is mainly chasing
pointers, i.e. reading relative or absolute byte
offsets and moving there for the next piece of
information. If everything goes well, none of that
code needs to change; the revision packing
algorithm will simply produce different offset
values.
> The current code is written in a certain way, not particularly
> optimized for this new format (I seem to remember "log" does around 10
> fopen calls for every interesting rev file, each time reading a
> different part of it). Also, if an operation currently needs to access
> many revisions (like log or blame), it doesn't take advantage at all
> of the fact that they might be in a single packed rev file. The pack
> file is opened and seeked in just as much as the sum of the individual
> rev files.
The fopen() calls should be eliminated by the
file handle cache. IOW, they should already be
addressed on the performance branch. Please
let me know if that is not the case.

FSFS format 6 would primarily reduce the number
of seek() and read() calls. Once the seeks() are
"in check", the size of the read buffer might become
configurable: remote file access might benefit from
larger buffers, e.g. equal to the network throughput
per 1 .. 10 ms.
> So: how will the current code be able to take advantage of this new
> format? Won't this require a major effort to restructure that code?
Old servers won't be able to read format 6 repos
(maybe they will but there is no guarantee). If a
large scale restructuring of the code would be
necessary, I may not be able to do and validate it.

The packing code, however, will probably be
completely rewritten.
> (This reminds me of the current difficulty (as I can see it, as an
> innocent bystander) with the WC-NG rewrite: theoretically it should be
> very fast, but the "higher level" code is still largely based upon the
> old principles. So to take advantage of it, certain things have to be
> changed at the higher level, making operations work "dir-based" or
> "tree-based", instead of file-based etc).
Well, the official goal is still to make 1.7 clients
faster than 1.6 for every operation. But there will
certainly be room for improvement in 1.8.

-- Stefan^2.
Received on 2010-12-29 20:52:07 CET

This message: [ Message body ]
Next message: Peter Samuelson: "Re: svn commit: r1053645 - /subversion/trunk/tools/po/l10n-report.py"
Previous message: Peter Samuelson: "Re: Any idea why public function like "svn_fspath__dirname" have double "__" in its name?"
In reply to: Johan Corveleyn: "Re: FSFS format 6 (was: Re: FSv2)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]