Re: svn commit: r1700799 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Sat, 5 Sep 2015 00:45:49 +0200

On Wed, Sep 2, 2015 at 3:33 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan2_at_apache.org> writes:
>
> > Author: stefan2
> > Date: Wed Sep 2 13:04:51 2015
> > New Revision: 1700799
> >
> > URL: http://svn.apache.org/r1700799
> > Log:
> > [Combines r1698359 and r170078 into a single commit for better review.]
> > Introduce a stream wrapper object that adds buffering support to any
> > readable stream. Use it on the stdin streams in our CL tools.
> >
> > As it turns out, parsing data from a stdin byte-by-byte incurs a
> > massive overhead of 100% internal and 300% system load over a buffered
> > stream. 'svnadmin load-revprops' sees a 5 times speedup if all data
> > is in OS disc caches. This is a realistic assumption in a "final sync
> > and switch over to new repository" scenario.
> >
> > The other 2 uses of stdin either have less data to process (svnfsfs
> > load-index) or parse only a small fraction of the stream (svnadmin load).
> >
> > * subversion/include/svn_io.h
> > (svn_stream_wrap_buffered_read): Declare the new stream constructor
> API.
>
> [...]
>
> If buffering makes difference for how well svnadmin load, load-revprops and
> svnfsfs load-index behave, can't we use the standard mechanism for that,
> say, apr_file_open_flags_stdin(..., APR_BUFFERED)?
>

That is indeed a good idea, so I tried it in various ways:

Index: subversion/libsvn_subr/stream.c
===================================================================
--- subversion/libsvn_subr/stream.c (revision 1701330)
+++ subversion/libsvn_subr/stream.c (working copy)
@@ -1826,10 +1826,20 @@ svn_stream_for_stdin(svn_stream_t **in, apr_pool_t
apr_status_t apr_err;

   apr_err = apr_file_open_stdin(&stdin_file, pool);
+
+/* Alternatively to the above: tell APR to create a buffered file
+ apr_err = apr_file_open_flags_stdin(&stdin_file, APR_BUFFERED, pool);
+*/
   if (apr_err)
     return svn_error_wrap_apr(apr_err, "Can't open stdin");

+/* Alternatively to the svn_stream_wrap_buffered_read() call or
+ * apr_file_open_flags_stdin, retroactively add a buffer to the APR file
+ apr_file_buffer_set(stdin_file, apr_palloc(pool, 4096), 4096);
+*/
+
*in = svn_stream_from_aprfile2(stdin_file, TRUE, pool);
+ *in = svn_stream_wrap_buffered_read(*in, pool);

return SVN_NO_ERROR;
}

> What is the point of reimplementing something that's already a part of the
> APR?
>

As it turns out, APR buffering is not compatible with streamy files.
Any of the 2 alternatives in the patch above causes ra_test to hang.

The problem seems to be that APR does not handle EAGAIN nor
EWOULDBLOCK when reading in buffered mode. I.e. buffered mode
assumes that the contents is readily available. To fix this, our file
read wrappers would have to replicate this section from readwrite,c
(possibly in different variants for platforms):

#ifdef USE_WAIT_FOR_IO
        if (rv == -1 &&
            (errno == EAGAIN || errno == EWOULDBLOCK) &&
            thefile->timeout != 0) {
            apr_status_t arv = apr_wait_for_io_or_timeout(thefile, NULL, 1);
            if (arv != APR_SUCCESS) {
                *nbytes = bytes_read;
                return arv;
            }
            else {
                do {
                    rv = read(thefile->filedes, buf, *nbytes);
                } while (rv == -1 && errno == EINTR);
            }
        }
#endif

It seems to me that the stream wrapper is the cleaner solution and
that it fits nicely with the general ideas behind stream design pattern.
Having lower latency than even buffered APR files is a bonus.

-- Stefan^2.
Received on 2015-09-05 00:46:05 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]