Re: svn commit: r1700799 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 10 Sep 2015 15:28:42 +0200

On Wed, Sep 9, 2015 at 12:31 PM, Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
wrote:

> Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> writes:
>
> >> The original commit begins using svn_stream_wrap_buffered_read() during
> >> svnadmin load-revprops and svnfsfs load-index. This patch, however,
> does
> >> something entirely different and adds buffering to *every* stdin.
> >
> > Sorry for the confusion. I did not intend to actually change the
> > implementation of svn_stream_for_stdin this way but simply tried to
> > demonstrate a problem with APR buffer reads for "streamy" file handles.
>
> Thank you for the explanation.
>

After some debugging I found the reason for the ra-test hang. Once I knew
it,
it's been kind of obvious from the code: For buffered files,
apr_file_read() is
more or less equivalent to apr_file_write_full().

Changing that in APR might break APR internals (buffer may be half-filled
but not be at EOF) as well as user code (read_full required where a normal
read used to be sufficient).

> > The underlying problem is still present: If stdin can't deliver data fast
> > enough (e.g. some hick-up / long latency on the producer side of a dump |
> > load), a buffered APR file will error out while a non-buffered one will
> > simply wait & retry.
> >
> > However, I have yet to try and provoke the error specifically for the
> > dump | load scenario.
>
> I think that this problem is nonexistent.
>
> A program doesn't need to handle EAGAIN during read() unless it puts the
> file
> into the non-blocking mode with O_NONBLOCK [1]. We don't do that for
> STDIN,
> and, irrespectively of what happens with the data on the other side of the
> pipe, read() is going to block until the requested data is available.
>
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
>

You are right. I tried various methods to delay the producer side and
it never caused the consumer to error out. The ra-test hang mislead
me into believing that the EAGAIN handling was the problem.

So, you are absolutely right about APR buffering being enough for our
non-interactive "read stdin as a single large file" usage in load & friends.

My last concern is as follows. We need an alternative implementation
for svn_stream_for_stdin. I see 3 options:

1. svn_stream_for_stdin2 with buffered option.
2. Always enable buffering in svn_stream_for_stdin.
3. Some private API.

The problem with 1. is that if we use APR_BUFFERED for it, the new
API will be add leakage to the abstraction: the user has to know when
it is safe to enable buffering. This problem does not occur if the wrapper
stream is being used instead.

Thus, variant 2 is impossible with APR_BUFFERED but quite possible
using the buffering wrapper. The only way it could cause a problem is
if some code was to rely on the actual size of a read request from stdin.
Since the latter is managed by the OS + C runtime, it is hard to think of
a way to make this happen - but still. Having a fall-back would be nice,
i.e. variant 1 is safer than 2.

Option 3 is basically saying "the public API is not good enough but we
don't want to give you a better one". It is, however, the only variant
where I would be fully comfortable just using APR buffering.

Your thoughts?

-- Stefan^2.
Received on 2015-09-10 15:28:59 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]