[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1586922 - in /subversion/trunk/subversion: include/private/svn_io_private.h include/svn_types.h libsvn_subr/io.c libsvn_subr/stream.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Wed, 23 Apr 2014 15:10:58 +0200

On Fri, Apr 18, 2014 at 6:21 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:

> On 18 April 2014 15:46, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
> wrote:
> > On Fri, Apr 18, 2014 at 10:06 AM, Ivan Zhakov <ivan_at_visualsvn.com>
> wrote:
> >>
> >> On 14 April 2014 20:11, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> >> > On 13 April 2014 08:40, <stefan2_at_apache.org> wrote:
> >> >> Author: stefan2
> >> >> Date: Sun Apr 13 04:40:40 2014
> >> >> New Revision: 1586922
> >> >>
> >> >> URL: http://svn.apache.org/r1586922
> >> >> Log:
> >> >> Speed up file / stream comparison, i.e. minimize the processing
> >> >> overhead
> >> >> for finding the first mismatch.
> >> >>
> >> >> The approach is two-sided. Instead of fetching
> SVN__STREAM_CHUNK_SIZE
> >> >> from all sources before comparing data, we start with a much lower
> 4kB
> >> >> and increase the chunk size until we reach SVN__STREAM_CHUNK_SIZE
> while
> >> >> making sure that all reads are naturally aligned. So, we quickly
> find
> >> >> mismatches near the beginning of the file.
> >> >>
> >> >> On the other end side, we bump the SVN__STREAM_CHUNK_SIZE to 64kB
> which
> >> >> gives better throughput for longer distances to the first mismatch -
> >> >> without causing ill effects in APR's memory management.
> >> >>
> >> >> * subversion/include/svn_types.h
> >> >> (SVN__STREAM_CHUNK_SIZE): Bump to 64k and add some documentation on
> >> >> the general restrictions for future
> >> >> changes.
> >> >>
> >> > Hi Stefan,
> >> >
> >> > You effectively reverted my recent fix (r1581296) for high memory
> >> > usage with many repositories open at the same time (about 500k per
> >> > repository. Also please consider r857671 and discussion before that
> >> > commit:
> >> > http://svn.haxx.se/dev/archive-2004-11/0123.shtml
> >> >
> >> > So please revert.
> >> >
> >> Sorry, may be I was not clear enough, but I was vetoing this change.
> >> Given no reply, I'm going to revert it.
> >
> >
> > [Been AFK for the last 3 days]
> >
> > Don't revert; your veto is based on faulty measurements & assumptions.
> > Here the shortlist:
> >
> My veto is still my veto, you cannot judge it.
>

But of course, I can judge it and do many other things to it as
I please.

You are right that I could not veto your veto - and I'm not trying
to do that. I'll try, however, to demonstrate that it does not have
a technical justification. If I succeed, it has never been a valid
veto in the first place.

You don't get the point: you changed *global* constant used

Well, one point of global constants is their central changeability
without jeopardizing consistency, i.e. they may assume any value
without breaking the code (insert a gazillion exceptions here).
They should not be changed wantonly, though.

> in
> multiple situation for just FSFS optimization. The value for this
> constant was discussed long time ago and was reduced for serious
> reasons. You changed it just to fix specific FSFS cases. I believe
> it's wrong way for coding.
>

My change was in way related to FSFS and I doubt that it would
have any impact at all: Config files don't care as they are too small
and repo I/O is using an entirely independent block size (or just
the APR 4k default).

You, however, made the initial change to config_file.c all about
repositories - and since you are not using BDB - about FSFS:

On Fri, Apr 18, 2014 at 10:06 AM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>
> You effectively reverted my recent fix (r1581296) for high memory
> usage with many repositories open at the same time (about 500k per
> repository. Also please consider r857671 and discussion before that
> commit:
> http://svn.haxx.se/dev/archive-2004-11/0123.shtml
>

The 500k claim is demonstrably false. It is about 150k in 1.7,
200k in 1.8 and 46k on /trunk before recent optimizations that
brought it down to 10k. The 1.7 and 1.8 usage was due to every
membuffer cache frontend using a private scratch pool.

So, your reasoning for the initial patch as been faulty and cannot
serve as a basis to veto further change. The patch itself is fine,
though, since it now uses our standard constant for buffer sizes.

The post you are citing is the same that I already mentioned in
my initial reply. I clearly says that 64k is just as well as 16k.
So, it does not counter my reasoning. However, 10 years have
passed and seen the raise of SSDs that are only now unveiling
their full potential.

For cold disc caches, I begin see improvements in 'svn st' on
large 'touched' files when the disc throughput is 1GB/s or better.
For hot disc caches (commit after status), the gain is ~15%.
That was the motivation for my commit.

On Fri, Apr 18, 2014 at 6:21 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:

> You also made libsvn_wc a bit slower, because while your 'slow start'
> compare algorithm may be works fine for FSFS it doesn't work well for
> comparing on disk files in WC.
>

Do you have any numbers to back up your claim? Here are mine
for the test suite execution on my macbook (Linux, ramdisk):

r1587968 1m54.371s (ra_svn) 1m40.437s (ra_local)
r1586922 1m54.608s (ra_svn) 1m41.495s (ra_local)
r1586921 1m54.208s (ra_svn) 1m41.936s (ra_local)

So, there is a mild indication that the 'svn log' improvements
might have helped but on the whole the results are as expected:
file sizes are very small and histories are short; tuning for
throughput should not help much.

> > * SVN__STREAM_CHUNK_SIZE is only to be used with files and
> > streams (duh!) - not with data structures that persist in RAM.
> > If your use of that constant were to actually effect the size of a
> > repo instance, that use would be faulty.
> >
> > * parse_context_t is only instantiated in one place,
> > svn_config__parse_stream
> > and will be destroyed at the end of that function. So, there is no
> > persistent memory use from this one.
> This depends on allocator and heap behavior.
>

Only in the sense that freed memory can be reused when
needed. I guess that is a safe assumption - unless you
want to claim that using pools is pointless after all.

> > * According to *my* measurements, a FSFS instance takes about
> > 46kB just after fs_open() and under 64 bit Unix. Of those 46k,
> > 64k were used - according to you - for the stream buffer alone ...
> >
> Yes, on trunk FSFS instances doesn't use so much memory, because
> specific subpools added and APR buffered files avoided. But this
> issues can reappear again in future, like it was in Subversion 1.8.x.
>

You probably had a systematic error in your measurements.
To measure marginal memory usage, you need to open
thousands of repo instances (may always be the same on disk).

My macbook now happily opens 1M repos without running OOM.
What is your test scenario?

So, I'm still veto this change. Please revert it. Then I suggest you
> make this stream comparision algorithm private for FSFS, becuase it's
> makes sense only for 'expensive' streams.
>

I'm not aware that FSFS would need any stream comparison
because it is all hashsum based. Again, FSFS is *not* the
reason for the change.

Is there any reason left unaddressed, why you veto the change?

> I'm ready to discuss increasing SVN__STREAM_CHUNK_SIZE to 64k or any
> other value, if there are serous reasons for doing this, but only
> after reverting this change from trunk.
>

-- Stefan^2.
Received on 2014-04-23 15:11:31 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.