[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Performance Results on Windows

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 14 Aug 2014 00:19:52 +0200

On Wed, Aug 13, 2014 at 11:03 PM, Mark Phippard <markphip_at_gmail.com> wrote:

> On Wed, Aug 13, 2014 at 4:45 PM, Stefan Fuhrmann <
> stefan.fuhrmann_at_wandisco.com> wrote:
>
> It seems that CollabNET and other hosting providers possibly have one
>>
>>> of the worst configuration for log-adressing feature. Multiple users
>>> access over HTTP to a number of gigabyte-sized repositories (so there
>>> are no enough memory for enormous caches).
>>
>>
>> Well, the key would be GB-sized projects (data
>> transferred during export).
>>
>> I don't know how I feel about getting CollabNet involved.
>> My concern is that they simply won't have the time to
>> do it because we are not talking about "please, would
>> you run those 2 commands for me?" but rather a two
>> week effort.
>>
>>
> Not to speak for Ivan, but I think he is simply saying that sites that are
> hosting a significant number of SVN repositories are unlikely to be able to
> benefit from some of the things like the large cache sizes.
>

And that is perfectly fine. I only wanted to make sure that
those caches are actually too small to cover most of the
data. E.g. the SVN project itself is 100MB in a 50GB repo
and can easily be cached.

> I am not against making SVN faster in special controlled situations where
> you can fine tune a server for performance. Just please do not make it
> slower than it already is for the rest of us that cannot do that.
>

Agreed. Measurements so far indicate that the cold setup
has no performance regressions. It is only the (completely)
hot OS cache bits where CPU overhead is visible.

> To give an idea from a recent server I am dealing with, there are
> approximately 22K repositories with about 5.5 TB of data. They are served
> via Apache with prefork MPM. There is no room for adding some massive RAM
> cache to this, and I doubt it would help anyway.
>

Format 7 is designed for speeding up the *non-cached*
data access. The recent fine tuning ensured that it will
cope with the small-ish default caches for its temporary
state (e.g. you don't want to read the same directory
over and over again).

To benefit from format 7, you only need to pack your
repositories. Non-packed repos don't see any significant
difference (<5% slower due to more data being read for
c/o, ~20% faster for log) with default settings.

> FWIW, if the new format is primarily faster in specific situations, then
> why don't we just not make it the default format and instead make you
> specify a specific option when creating the repository to use the format?
> Then people can choose to opt-in to the format if they have an environment
> where it will be useful and their server will be tuned accordingly?
>

The specific situation is "pack repo and not all data in
cache". Configuring larger caches etc. will help f7 repos
more than f6 - but this is added benefit.

People can chose their repository format versions already.
Not making format 7 the default is a possibility but does
not change the range of choice that admins have. And since
we don't require people to upgrade their repositories, they
have full control if, when and where they roll out the new
format.

-- Stefan^2.
Received on 2014-08-14 00:20:22 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.