[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: FSFS format 6 (was: Re: FSv2)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 29 Dec 2010 01:58:42 +0100

On Sun, Dec 12, 2010 at 4:23 PM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> On 19.10.2010 15:10, Daniel Shahaf wrote:
>>
>> Greg Stein wrote on Tue, Oct 19, 2010 at 04:31:42 -0400:
>>>
>>> Personally, I see [FSv2] as a broad swath of API changes to align our
>>> needs with the underlying storage. Trowbridge noted that our current
>>> API makes it *really* difficult to implement an effective backend. I'd
>>> also like to see a backend that allows for parallel PUTs during the
>>> commit process. Hyrum sees FSv2 as some kind of super-key-value
>>> storage with layers on top, allowing for various types of high-scaling
>>> mechanisms.
>>
>> At the retreat, stefan2 also had some thoughts about this...
>>
> [This is just a brain-dump for 1.8+]
>
> While working on the performance branch I made some
> observations concerning the way FSFS organizes data
> and how that could be changed for reduced I/O overhead.
>
> notes/fsfs-improvements.txt contains a summary of that
> could be done to improve FSFS before FS-NG. A later
> FS-NG implementation should then still benefit from the
> improvements.

+(number of fopen calls during a log operation)

I like this proposal a lot. As I already told before, we are running
our FSFS back-end on a SAN over NFS (and I suspect we're not the only
company doing this). In this environment, the server-side I/O of SVN
(especially the amount of random reads and fopen calls during e.g.
log) is often the major bottleneck.

There is one question going around in my head though: won't you have
to change/rearrange a lot of the FS layer code (and maybe repos
layer?) to benefit from this new format?

The current code is written in a certain way, not particularly
optimized for this new format (I seem to remember "log" does around 10
fopen calls for every interesting rev file, each time reading a
different part of it). Also, if an operation currently needs to access
many revisions (like log or blame), it doesn't take advantage at all
of the fact that they might be in a single packed rev file. The pack
file is opened and seeked in just as much as the sum of the individual
rev files.

So: how will the current code be able to take advantage of this new
format? Won't this require a major effort to restructure that code?

(This reminds me of the current difficulty (as I can see it, as an
innocent bystander) with the WC-NG rewrite: theoretically it should be
very fast, but the "higher level" code is still largely based upon the
old principles. So to take advantage of it, certain things have to be
changed at the higher level, making operations work "dir-based" or
"tree-based", instead of file-based etc).

Cheers,

-- 
Johan
Received on 2010-12-29 01:59:38 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.