[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion 1.9.0-dev FSFS performance tests

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 7 Jul 2014 20:44:28 +0200

On Mon, Jul 7, 2014 at 5:54 PM, C. Michael Pilato <cmpilato_at_collab.net> wrote:
> On 07/07/2014 11:23 AM, Branko Čibej wrote:
>> On 07.07.2014 17:07, C. Michael Pilato wrote:
>>> On 07/07/2014 10:58 AM, Ivan Zhakov wrote:
>>>> My technical opinion that FSFS7/log addressing is slower by design,
>>>> because it's doing more (read index, then read data instead of just
>>>> read data) and only caching makes them comparable on performance to
>>>> FSFS6 repositories.
>>> I'm coming into this kinda late and after two weeks of vacation, so
>>> please forgive me if I misunderstand the above, but is it true that
>>> FSFS7 requires some kind of non-trivial caching just to match FSFS6's
>>> performance?
>>
>> Yup.

Nope.

F7 is all about I/O reduction. No I/O, no reduction. The savings
are significant and a factor of 2 is typical. Even SSDs see speedups.

Data size and read operations (when to read what noderev / rep / ..)
are roughly unchanged. Thus, if caches are hot, the extra addressing
overhead cost you something between 0 (hot SVN caches) and
10% CPU (hot OS caches only).

F7 adds another feature that had to be made opt-in: "block-read".
Instead of reading only a few 100 bytes, it makes SVN parse the
whole 64k block that the OS provides anyway and puts the data
into cache. In environments with slow fopen(), that should save
CPU, but it requires significant SVN caches to be able to eventually
use the prefetched data. I'm particularly keen to see how much
of an impact that makes on Windows.

> May I then presume that for folks who have many repositories being
> hosted from a single server, FSFS7 will necessarily bring either a CPU
> performance hit (insufficient cache) or a RAM requirement/consumption
> hit (sufficient, ginormous cache)? Or is the cache configuration
> perhaps per-server rather than per-repository?

I just started the Windows tests in a "realistic" environment
with a 4GB RAM server managing > 50GB of repository data.
The goal is clearly that the default config is not slower than
before and the computational overhead is roughly the same
as we added when introducing manifest files for packed repos.

The problem with all that measurement is that it is very hard
to create an environment that behaves roughly as the real
world would (multiple repositories being created over a long
period of time, interleaving each other). It took me a whole
day to rewrite the copy script that creates meaningful data
sets for systems whose operation cannot be controlled (SAN).

-- Stefan^2.
Received on 2014-07-07 20:44:57 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.