[ Just changing the subject to get better visibility, so maybe more
people will read this and do some experiments with fsfs-reorg (not
with production data of course), to check the impact on "cold I/O".
Thanks for the explanation, Stefan. ]
On Sat, Oct 13, 2012 at 7:06 PM, Stefan Fuhrmann
<stefan.fuhrmann_at_wandisco.com> wrote:
> On Thu, Oct 11, 2012 at 1:32 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>>
>> On Wed, Oct 10, 2012 at 7:09 PM, Stefan Fuhrmann
>> <stefan.fuhrmann_at_wandisco.com> wrote:
...
>> > BTW, that code is not supposed to be *ever*
>> > used for production data.
>>
>> Ok, good to know. I just executed the tool and saw the prominent
>> warning, so that's pretty clear.
>
>
> What I'm trying to say goes even beyond that.
> This tool will (probably) never evolve into something
> that would be used outside our dev community.
>
>>
>> [ ... ]
>>
>> > Would be nice if people could use it to test /
>> > evaluate the results. The hole idea is to verify
>> > the method before attempting significant changes
>> > to the FSFS layer in 1.9.
>>
>> Can you summarize a bit (maybe you explained it already in some notes
>> file, but I don't quite remember) what it does again? What's the goal
>> really? Is it about reshuffling the data inside the pack files to be
>> more I/O efficient, while maintaining compatibility with existing
>> servers (so a reorg'ed repository can be read by any 1.x server)? If
>> so, how does it do that actually?
>
>
> SVN 1.8 will have 100% cache coverage in the sense
> that except for the format, fsfs.conf and friends, you
> can serve all r/o requests from the cache once that
> got populated.
>
> The next logical step is to reduce the amount of I/O
> (physical seeks as well as data transfer). The basic
> idea is layed out the fsfs-improvements notes but the
> tool implementation goes a bit beyond that:
>
> * "overlay" revisions within a pack file, i.e. the offset
> ranges overlap in the physical file
> * put all the "changes" lists at the begin of the pack file
> (used for log only)
> * starting at /@HEAD, add node-rev, followed by reps
> (in delta-order). Once a node is complete, continue
> with its youngest sub-node until the tree is complete
> * Continue with the youngest element not covered.
>
> The output should be compatible with SVN 1.6+
> (if the input was). Older formats are not supported -
> for simplicity.
>
> As a result, many related rep deltas should sit next to
> each other. Also, elements relevant for newer nodes
> should be at the beginning of the file and older ones
> tend to be moved to the end. Finally, we keep nodes
> that are next to each other in the tree close to one
> another in the resulting pack file.
>
> For the ASF repo, I've got a ~3 times speedup for
> a "cold" checkout of SVN trunk (repo on an USB disk).
>
> But I may change / refine the placement stragegy
> to e.g. put all props with mergeinfo in one place.
>
>> And, if we're thinking about evaluating the results: what should one
>> focus on? Any particular use cases that should get a significant
>> positive effect? Any use cases that might possibly be negatively
>> affected?
>
>
> There are two main points of interest for me:
>
> * does the conversion work or is it missing something
> for your repo?
> * does "cold" I/O go down? By how much and for
> which operations?
>
> I found that using an USB disk to store the repo is
> actually pretty neat because you can simply unplug
> it and the OS will discard all cached data.
>
>
> -- Stefan^2.
>
> --
>
> Join us this October at Subversion Live 2012 for two days of best practice
> SVN training, networking, live demos, committer meet and greet, and more!
> Space is limited, so get signed up today!
>
>
--
Johan
Received on 2012-10-16 11:01:40 CEST