[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

About fsfs-reorg - experimenting to reduce "cold I/O" (was: warnings in fsfs-reorg.c)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 16 Oct 2012 11:00:46 +0200

[ Just changing the subject to get better visibility, so maybe more
people will read this and do some experiments with fsfs-reorg (not
with production data of course), to check the impact on "cold I/O".
Thanks for the explanation, Stefan. ]

On Sat, Oct 13, 2012 at 7:06 PM, Stefan Fuhrmann
<stefan.fuhrmann_at_wandisco.com> wrote:
> On Thu, Oct 11, 2012 at 1:32 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> On Wed, Oct 10, 2012 at 7:09 PM, Stefan Fuhrmann
>> <stefan.fuhrmann_at_wandisco.com> wrote:
>> > BTW, that code is not supposed to be *ever*
>> > used for production data.
>> Ok, good to know. I just executed the tool and saw the prominent
>> warning, so that's pretty clear.
> What I'm trying to say goes even beyond that.
> This tool will (probably) never evolve into something
> that would be used outside our dev community.
>> [ ... ]
>> > Would be nice if people could use it to test /
>> > evaluate the results. The hole idea is to verify
>> > the method before attempting significant changes
>> > to the FSFS layer in 1.9.
>> Can you summarize a bit (maybe you explained it already in some notes
>> file, but I don't quite remember) what it does again? What's the goal
>> really? Is it about reshuffling the data inside the pack files to be
>> more I/O efficient, while maintaining compatibility with existing
>> servers (so a reorg'ed repository can be read by any 1.x server)? If
>> so, how does it do that actually?
> SVN 1.8 will have 100% cache coverage in the sense
> that except for the format, fsfs.conf and friends, you
> can serve all r/o requests from the cache once that
> got populated.
> The next logical step is to reduce the amount of I/O
> (physical seeks as well as data transfer). The basic
> idea is layed out the fsfs-improvements notes but the
> tool implementation goes a bit beyond that:
> * "overlay" revisions within a pack file, i.e. the offset
> ranges overlap in the physical file
> * put all the "changes" lists at the begin of the pack file
> (used for log only)
> * starting at /@HEAD, add node-rev, followed by reps
> (in delta-order). Once a node is complete, continue
> with its youngest sub-node until the tree is complete
> * Continue with the youngest element not covered.
> The output should be compatible with SVN 1.6+
> (if the input was). Older formats are not supported -
> for simplicity.
> As a result, many related rep deltas should sit next to
> each other. Also, elements relevant for newer nodes
> should be at the beginning of the file and older ones
> tend to be moved to the end. Finally, we keep nodes
> that are next to each other in the tree close to one
> another in the resulting pack file.
> For the ASF repo, I've got a ~3 times speedup for
> a "cold" checkout of SVN trunk (repo on an USB disk).
> But I may change / refine the placement stragegy
> to e.g. put all props with mergeinfo in one place.
>> And, if we're thinking about evaluating the results: what should one
>> focus on? Any particular use cases that should get a significant
>> positive effect? Any use cases that might possibly be negatively
>> affected?
> There are two main points of interest for me:
> * does the conversion work or is it missing something
> for your repo?
> * does "cold" I/O go down? By how much and for
> which operations?
> I found that using an USB disk to store the repo is
> actually pretty neat because you can simply unplug
> it and the OS will discard all cached data.
> -- Stefan^2.
> --
> Join us this October at Subversion Live 2012 for two days of best practice
> SVN training, networking, live demos, committer meet and greet, and more!
> Space is limited, so get signed up today!

Received on 2012-10-16 11:01:40 CEST

This is an archived mail posted to the Subversion Dev mailing list.