[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1586922 - in /subversion/trunk/subversion: include/private/svn_io_private.h include/svn_types.h libsvn_subr/io.c libsvn_subr/stream.c

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Mon, 14 Apr 2014 20:11:48 +0400

On 13 April 2014 08:40, <stefan2_at_apache.org> wrote:
> Author: stefan2
> Date: Sun Apr 13 04:40:40 2014
> New Revision: 1586922
>
> URL: http://svn.apache.org/r1586922
> Log:
> Speed up file / stream comparison, i.e. minimize the processing overhead
> for finding the first mismatch.
>
> The approach is two-sided. Instead of fetching SVN__STREAM_CHUNK_SIZE
> from all sources before comparing data, we start with a much lower 4kB
> and increase the chunk size until we reach SVN__STREAM_CHUNK_SIZE while
> making sure that all reads are naturally aligned. So, we quickly find
> mismatches near the beginning of the file.
>
> On the other end side, we bump the SVN__STREAM_CHUNK_SIZE to 64kB which
> gives better throughput for longer distances to the first mismatch -
> without causing ill effects in APR's memory management.
>
> * subversion/include/svn_types.h
> (SVN__STREAM_CHUNK_SIZE): Bump to 64k and add some documentation on
> the general restrictions for future changes.
>
Hi Stefan,

You effectively reverted my recent fix (r1581296) for high memory
usage with many repositories open at the same time (about 500k per
repository. Also please consider r857671 and discussion before that
commit:
http://svn.haxx.se/dev/archive-2004-11/0123.shtml

So please revert.

> * subversion/include/private/svn_io_private.h
> (svn_io__next_chunk_size): New utility function generating the new
> read block size sequence.
>
> * subversion/libsvn_subr/io.c
> (svn_io__next_chunk_size): Implement.
> (contents_identical_p,
> contents_three_identical_p): Let the new utility determine the read
> block size.
>
> * subversion/libsvn_subr/stream.c
> (svn_stream_contents_same2): Ditto.
>
This optimization makes sense only for FSFS case where fetching
text/property content is expensive, but it doesn't makes sense for
comparing on disk files. So could you please make this optimization
FSFS specific, without affecting generic API and behavior. Probably it
better to make just smaller chunk size for FSFS case btw. Thanks in
advance.

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com
Received on 2014-04-14 18:26:36 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.