[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Simplify and optimize RA-svn's string marshalling

From: Daniel Shahaf <danielsh_at_elego.de>
Date: Wed, 8 Jun 2011 06:15:46 +0300

Julian Foad wrote on Mon, Jun 06, 2011 at 15:45:11 +0100:
> On Fri, 2011-06-03, Daniel Shahaf wrote:
> > Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100:
> > > +++ subversion/libsvn_ra_svn/marshal.c (working copy)
> > > @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
> > > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> > > - svn_stringbuf_t *stringbuf, apr_uint64_t len)
> > > -{
> > > - char readbuf[4096];
> > > - apr_size_t readbuf_len;
> > > -
> > > - /* We can't store strings longer than the maximum size of apr_size_t,
> > > - * so check for wrapping */
> > > - if (((apr_size_t) len) < len)
> >
> > Why did you remove this check?
>
> I was removing all this relatively new code and it wasn't there before,
> and I wasn't convinced it was very important. But it would provide a
> quick exit (instead of filling up all available memory) in some error
> cases or potential DOS attempts so I'll re-instate it.
>

Okay. (I asked because this hunk didn't seem related to what your
message said the patch did.)

> > > - return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
> > > - _("String length larger than maximum"));
> > > -
...
> > > + SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len));
> > > +
> > > + stringbuf->len += readbuf_len;
> > > + stringbuf->data[stringbuf->len] = '\0';
> > > + len -= readbuf_len;
> >
> > What happens if, at this point, there are less than READBUF_LEN bytes
> > available?
> >
> > (e.g., if the client terminates the connection in the middle of
> > a string literal.)
>
> This part of the behaviour is unchanged from the original code. I don't
> know what happens, but I imagine readbuf_read() returns an error,
> perhaps SVN_ERR_RA_SVN_CONNECTION_CLOSED which I see readbuf_input() can
> do.
>

Hmm. Right, it seems obvious when I look at the code now. (For some
reason I'm sure I was reading different code at the time I wrote the
question.)

But... isn't there a bug here?

[[[
Index: subversion/libsvn_ra_svn/marshal.c
===================================================================
--- subversion/libsvn_ra_svn/marshal.c (revision 1133231)
+++ subversion/libsvn_ra_svn/marshal.c (working copy)
@@ -285,14 +285,15 @@ static svn_error_t *readbuf_input(svn_r
 
                                   apr_size_t *len, apr_pool_t *pool)
 {
   svn_ra_svn__session_baton_t *session = conn->session;
+ apr_size_t wanted = len;
 
   if (session && session->callbacks &&
       session->callbacks->cancel_func)
     SVN_ERR((session->callbacks->cancel_func)(
               session->callbacks_baton));
 
   SVN_ERR(svn_ra_svn__stream_read(conn->stream, data, len));
- if (*len == 0)
+ if (*len < wanted)
     return svn_error_create(SVN_ERR_RA_SVN_CONNECTION_CLOSED, NULL, NULL);
 
   if (session)
]]]

From svn_stream_t doxygen:

 * Handlers are obliged to complete a read or write to the maximum
 * extent possible; thus, a short read with no associated error implies
 * the end of the input stream, and a short write should never occur
 * without an associated error.

> The revised patch is attached.
>
> - Julian
>

> Simplify and optimize RA-svn's string marshalling.
>
> This largely reverts r1028352 which merged r985606 and r1028092 from the
> performance branch. That change used separate functions to handle short and
> long strings. This change instead improves the original code to handle both
> short and long strings efficiently.
>
> Compared with the pre-r1028352 code, this version uses a 1-MB chunk size
> instead of 4-KB and eliminates one memcpy of the data.
>
> * subversion/libsvn_ra_svn/marshal.c
> (read_long_string): Remove.
> (read_string): Handle long strings with the same code as short strings, by
> reading directly into pre-allocated space, but pre-allocate no more than
> a reasonable chunk size at a time.
> --This line, and those below, will be ignored--
>
> Index: subversion/libsvn_ra_svn/marshal.c
> ===================================================================
> --- subversion/libsvn_ra_svn/marshal.c (revision 1132610)
> +++ subversion/libsvn_ra_svn/marshal.c (working copy)
> @@ -563,14 +563,13 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
>
> /* --- READING DATA ITEMS --- */
>
> -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF.
> - * POOL will be used for temporary allocations. */
> -static svn_error_t *
> -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> - svn_stringbuf_t *stringbuf, apr_uint64_t len)
> +/* Read LEN bytes from CONN into already-allocated structure ITEM.
> + * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string
> + * data is allocated in POOL. */
> +static svn_error_t *read_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> + svn_ra_svn_item_t *item, apr_uint64_t len)
> {
> - char readbuf[4096];
> - apr_size_t readbuf_len;
> + svn_stringbuf_t *stringbuf;
>
> /* We can't store strings longer than the maximum size of apr_size_t,
> * so check for wrapping */
> @@ -578,67 +577,36 @@ read_long_string(svn_ra_svn_conn_t *conn
> return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
> _("String length larger than maximum"));
>
> + /* Read the string in chunks. The chunk size is large enough to avoid
> + * re-allocation in typical cases, and small enough to ensure we do not
> + * pre-allocate an unreasonable amount of memory if (perhaps due to
> + * network data corruption or a DOS attack), we receive a bogus claim that
> + * a very long string is going to follow. In that case, we start small
> + * and wait for all that data to actually show up. This does not fully
> + * prevent DOS attacks but makes them harder (you have to actually send
> + * gigabytes of data). */
> + stringbuf = svn_stringbuf_create_ensure(
> + (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD),
> + pool);
> +

Sorry to point this just now, but this change seems to be problematic
for the common case:

It will make *every* string read allocate 1MB of memory --- in every
single place in the protocol where a 'string' (as opposed to a 'word' or
a tuple) is expected. And, worse, the string will then *remain*
allocated in that 1MB memory space --- even if the string is just
"svn:author\0".

I think we want different default sizes for different strings --- i.e.,
a different size if the string is a property name or an fspath v. if
it's an svndiff chunk of file contents --- but I'm entirely unsure that
we can do that without violating layering or extending the protocol.

> + /* Read the string data directly into the string structure.
> + * Do it iteratively, if necessary. */
> while (len)
> {
> + apr_size_t readbuf_len
> + = (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD);
> +
> + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len + 1);
> + SVN_ERR(readbuf_read(conn, pool, stringbuf->data + stringbuf->len,
> + readbuf_len));
>
> + stringbuf->len += readbuf_len;

This is related to the overflow check on LEN earlier: because of having
that check, we know that this addition won't overflow.

> + stringbuf->data[stringbuf->len] = '\0';
> len -= readbuf_len;
> }
>
> /* Return the string properly wrapped into an RA_SVN item. */
> item->kind = SVN_RA_SVN_STRING;
> item->u.string = svn_stringbuf__morph_into_string(stringbuf);
Received on 2011-06-08 05:16:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.