[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: warnings in fsfs-reorg.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Sat, 13 Oct 2012 19:06:34 +0200

On Thu, Oct 11, 2012 at 1:32 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:

> On Wed, Oct 10, 2012 at 7:09 PM, Stefan Fuhrmann
> <stefan.fuhrmann_at_wandisco.com> wrote:
> > On Tue, Oct 9, 2012 at 10:19 PM, Johan Corveleyn <jcorvel_at_gmail.com>
> wrote:
> >>
> >> I haven't built trunk in a couple of weeks. I'm catching up now, and I
> >> see a lot of new warnings coming from fsfs-reorg.c (compiled with
> >> Visual C Express 2008 on WinXP). Stefan (or anyone else), can you take
> >> a look at these?
> >
> >
> > Yes, I can. But maybe not before the weekend.
> > As long as your repository does not contain pack
> > files > 4GB, you should be safe. If it does,
> > changes are that you will run out of memory
> > on that repo anyway ;)
>
> Ah yes, especially on my 32-bit system :-). There's no rush, I just
> noticed the warnings and felt the urge to report them :-) ...
>
> > BTW, that code is not supposed to be *ever*
> > used for production data.
>
> Ok, good to know. I just executed the tool and saw the prominent
> warning, so that's pretty clear.
>

What I'm trying to say goes even beyond that.
This tool will (probably) never evolve into something
that would be used outside our dev community.

> [ ... ]
>
> > Would be nice if people could use it to test /
> > evaluate the results. The hole idea is to verify
> > the method before attempting significant changes
> > to the FSFS layer in 1.9.
>
> Can you summarize a bit (maybe you explained it already in some notes
> file, but I don't quite remember) what it does again? What's the goal
> really? Is it about reshuffling the data inside the pack files to be
> more I/O efficient, while maintaining compatibility with existing
> servers (so a reorg'ed repository can be read by any 1.x server)? If
> so, how does it do that actually?
>

SVN 1.8 will have 100% cache coverage in the sense
that except for the format, fsfs.conf and friends, you
can serve all r/o requests from the cache once that
got populated.

The next logical step is to reduce the amount of I/O
(physical seeks as well as data transfer). The basic
idea is layed out the fsfs-improvements notes but the
tool implementation goes a bit beyond that:

* "overlay" revisions within a pack file, i.e. the offset
  ranges overlap in the physical file
* put all the "changes" lists at the begin of the pack file
  (used for log only)
* starting at /@HEAD, add node-rev, followed by reps
  (in delta-order). Once a node is complete, continue
  with its youngest sub-node until the tree is complete
* Continue with the youngest element not covered.

The output should be compatible with SVN 1.6+
(if the input was). Older formats are not supported -
for simplicity.

As a result, many related rep deltas should sit next to
each other. Also, elements relevant for newer nodes
should be at the beginning of the file and older ones
tend to be moved to the end. Finally, we keep nodes
that are next to each other in the tree close to one
another in the resulting pack file.

For the ASF repo, I've got a ~3 times speedup for
a "cold" checkout of SVN trunk (repo on an USB disk).

But I may change / refine the placement stragegy
to e.g. put all props with mergeinfo in one place.

And, if we're thinking about evaluating the results: what should one
> focus on? Any particular use cases that should get a significant
> positive effect? Any use cases that might possibly be negatively
> affected?
>

There are two main points of interest for me:

* does the conversion work or is it missing something
  for your repo?
* does "cold" I/O go down? By how much and for
  which operations?

I found that using an USB disk to store the repo is
actually pretty neat because you can simply unplug
it and the OS will discard all cached data.

-- Stefan^2.

-- 
*
Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*
Received on 2012-10-13 19:07:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.