[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1586922 - in /subversion/trunk/subversion: include/private/svn_io_private.h include/svn_types.h libsvn_subr/io.c libsvn_subr/stream.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Mon, 14 Apr 2014 14:34:07 +0200

On Sun, Apr 13, 2014 at 5:28 PM, Bert Huijben <bert_at_qqmail.nl> wrote:

>
>
> > -----Original Message-----
> > From: stefan2_at_apache.org [mailto:stefan2_at_apache.org]
> > Sent: zondag 13 april 2014 06:41
> > To: commits_at_subversion.apache.org
> > Subject: svn commit: r1586922 - in /subversion/trunk/subversion:
> > include/private/svn_io_private.h include/svn_types.h libsvn_subr/io.c
> > libsvn_subr/stream.c
> >
> > Author: stefan2
> > Date: Sun Apr 13 04:40:40 2014
> > New Revision: 1586922
> >
> > URL: http://svn.apache.org/r1586922
> > Log:
> > Speed up file / stream comparison, i.e. minimize the processing overhead
> > for finding the first mismatch.
> >
> > The approach is two-sided. Instead of fetching SVN__STREAM_CHUNK_SIZE
> > from all sources before comparing data, we start with a much lower 4kB
> > and increase the chunk size until we reach SVN__STREAM_CHUNK_SIZE
> > while
> > making sure that all reads are naturally aligned. So, we quickly find
> > mismatches near the beginning of the file.
> >
> > On the other end side, we bump the SVN__STREAM_CHUNK_SIZE to 64kB
> > which
> > gives better throughput for longer distances to the first mismatch -
> > without causing ill effects in APR's memory management.
> >
> > * subversion/include/svn_types.h
> > (SVN__STREAM_CHUNK_SIZE): Bump to 64k and add some documentation
> > on
> > the general restrictions for future changes.
>
> This reverts recent changes to reduce server side load... (see config file
> behavior changes, recently discussed on dev_at_s.a.o")

> Please revert this and use a different variable...
>

I disagree. The documentation clearly states the application
of SVN__STREAM_CHUNK_SIZE. See also
http://svn.apache.org/viewvc?view=revision&revision=r857671
http://svn.haxx.se/dev/archive-2004-11/0123.shtml

If you accidentally used for a different purpose, it would
be you who had to fix that.

However, I assume you are talking about r1581296. The critical
bit there was to go from 100k (non-paged = fragmented allocation
in APR pool allocator) to < 80k (paged = non-fragmented) for
the whole parse_context_t. This is not violated by bumping
the buffer size to 64k.

> And I really wonder how you determined that the rest of the patch is going
> to help *for all users* both client and server side.
>

I don't have an obligation to verify that *all* users are going
to benefit but I determined the following (elaborating on what
is already stated in the commit message and the code itself):

* Increasing the allocation size for buffers from 16k to 64k
  will not affect allocation performance. It will also no affect
  overall memory consumption if those buffers are temporary
  (this condition is met by file and stream buffers).

* Functions that allow for an early exit (in this patch: file /
  stream comparison) can now exit after reading 8/12k
  instead of 32/48k. So, they are now slightly faster than
  before and won't get hurt by the larger ultimate chunk size.

There may be more functions that have a regular (non-exceptional)
early exit and would benefit from svn_io__next_chunk_size.
A quick scan over all users of SVN__STREAM_CHUNK_SIZE
didn't bring up something obvious, though.

The driver of this patch has been handling working copies
with large binaries in it that got "touched" but not modified.
Using a larger chunk size gave a 15% higher throughput
in 'svn st' on my Macbook.

Working copies on NFS volumes with "normal" file changes
should benefit from the reduced initial chunk size. I haven't
tested that, though. Local drives don't show any over 16k
but something like 2..5% over using a fixed 64k block size.

Do you have a scenario in mind where this change is actually
counter-productive? According to conventional wisdom, larger
chunks should always tend to be faster than smaller ones.

-- Stefan^2.
Received on 2014-04-14 14:34:40 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.