[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Simplify and optimize RA-svn's string marshalling

From: Daniel Shahaf <danielsh_at_elego.de>
Date: Fri, 3 Jun 2011 13:56:49 +0300

Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100:
> Hi Stefan2 and others.
>
> This patch simplifies code in RA-svn/marshal.c by using a single code
> path to read both short and long strings efficiently. Using a single
> code path is beneficial for test coverage.
>
> It is an alternative to r1028352 which merged r985606 and r1028092 from
> the performance branch.
>
> I have run the test suite over RA-svn (with BDB) but haven't done any
> more testing than that.

Unless you built with SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD set to
something less than 20, this may not have tested the "Large string" code
path (when the loop loops).

> It's just something I noticed while looking at
> that code, it's not solving a problem I've noticed or anything like
> that. By inspection I believe it is more efficient than
> read_long_string() was, and unchanged for short strings, but if you're
> able to do any performance analysis on it, that would be great.
>
> - Julian
>

> Simplify and optimize RA-svn's string marshalling.
>
> This largely reverts r1028352 which merged r985606 and r1028092 from the
> performance branch. That change used separate functions to handle short and
> long strings. This change instead improves the original code to handle both
> short and long strings efficiently.
>
> Compared with the pre-r1028352 code, this version uses a 1-MB chunk size
> instead of 4-KB and eliminates one memcpy of the data.
>
> * subversion/libsvn_ra_svn/marshal.c
> (read_long_string): Remove.
> (read_string): Handle long strings with the same code as short strings, by
> reading directly into pre-allocated space, but pre-allocate no more than
> a reasonable chunk size at a time.
> --This line, and those below, will be ignored--
>
> Index: subversion/libsvn_ra_svn/marshal.c
> ===================================================================
> --- subversion/libsvn_ra_svn/marshal.c (revision 1130931)
> +++ subversion/libsvn_ra_svn/marshal.c (working copy)
> @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
>
> /* --- READING DATA ITEMS --- */
>
> -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF.
> - * POOL will be used for temporary allocations. */
> -static svn_error_t *
> -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> - svn_stringbuf_t *stringbuf, apr_uint64_t len)
> -{
> - char readbuf[4096];
> - apr_size_t readbuf_len;
> -
> - /* We can't store strings longer than the maximum size of apr_size_t,
> - * so check for wrapping */
> - if (((apr_size_t) len) < len)

Why did you remove this check?

> - return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
> - _("String length larger than maximum"));
> -
> - while (len)
> - {
> - readbuf_len = len > sizeof(readbuf) ? sizeof(readbuf) : (apr_size_t)len;
> -
> - SVN_ERR(readbuf_read(conn, pool, readbuf, readbuf_len));
> - /* Read into a stringbuf_t to so we don't allow the sender to allocate
> - * an arbitrary amount of memory without actually sending us that much
> - * data */
> - svn_stringbuf_appendbytes(stringbuf, readbuf, readbuf_len);
> - len -= readbuf_len;
> - }
> -
> - return SVN_NO_ERROR;
> -}
> -
> /* Read LEN bytes from CONN into already-allocated structure ITEM.
> * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string
> * data is allocated in POOL. */
> @@ -601,42 +571,34 @@ static svn_error_t *read_string(svn_ra_s
> {
> svn_stringbuf_t *stringbuf;
>
> - /* We should not use large strings in our protocol. However, we may
> - * receive a claim that a very long string is going to follow. In that
> - * case, we start small and wait for all that data to actually show up.
> - * This does not fully prevent DOS attacs but makes them harder (you
> - * have to actually send gigabytes of data).
> - */
> - if (len > SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD)
> - {
> - /* This string might take a large amount of memory. Don't allocate
> - * the whole buffer at once, so to prevent OOM issues by corrupted
> - * network data.
> - */
> - stringbuf = svn_stringbuf_create("", pool);
> - SVN_ERR(read_long_string(conn, pool, stringbuf, len));
> - }
> - else
> + /* Read the string in chunks. The chunk size is large enough to avoid
> + * re-allocation in typical cases, and small enough to ensure we do not
> + * pre-allocate an unreasonable amount of memory if (perhaps due to
> + * network data corruption or a DOS attack), we receive a bogus claim that
> + * a very long string is going to follow. In that case, we start small
> + * and wait for all that data to actually show up. This does not fully
> + * prevent DOS attacks but makes them harder (you have to actually send
> + * gigabytes of data). */
> + stringbuf = svn_stringbuf_create_ensure(
> + (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD),
> + pool);
> +
> + /* Read the string data directly into the string structure.
> + * Do it iteratively, if necessary. */
> + while (len)
> {
> - /* This is a reasonably sized string. So, provide a buffer large
> - * enough to prevent re-allocation as long as the data transmission
> - * is not flawed.
> - */
> - stringbuf = svn_stringbuf_create_ensure((apr_size_t)len, pool);
> -
> - /* Read the string data directly into the string structure.
> - * Do it iteratively, if necessary.
> - */
> - while (len)
> - {
> - apr_size_t readbuf_len = (apr_size_t)len;
> - char *dest = stringbuf->data + stringbuf->len;
> - SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len));
> -
> - stringbuf->len += readbuf_len;
> - stringbuf->data[stringbuf->len] = '\0';
> - len -= readbuf_len;
> - }
> + apr_size_t readbuf_len
> + = (apr_size_t)(len < SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD
> + ? len : SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD);
> + char *dest = stringbuf->data + stringbuf->len;
> +
> + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len);

Need ->len + ->len + 1, for NUL. (Oddly, this API requires this
off-by-one while _create_ensure() does it for you.)

> + SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len));
> +
> + stringbuf->len += readbuf_len;
> + stringbuf->data[stringbuf->len] = '\0';
> + len -= readbuf_len;

What happens if, at this point, there are less than READBUF_LEN bytes
available?

(e.g., if the client terminates the connection in the middle of
a string literal.)

> }
>
> /* Return the string properly wrapped into an RA_SVN item.
Received on 2011-06-03 12:58:00 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.