[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Simplify and optimize RA-svn's string marshalling

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Mon, 06 Jun 2011 15:45:11 +0100

On Fri, 2011-06-03, Daniel Shahaf wrote:
> Julian Foad wrote on Fri, Jun 03, 2011 at 09:52:21 +0100:
> > Hi Stefan2 and others.
> >
> > This patch simplifies code in RA-svn/marshal.c by using a single code
> > path to read both short and long strings efficiently. Using a single
> > code path is beneficial for test coverage.
> >
> > It is an alternative to r1028352 which merged r985606 and r1028092 from
> > the performance branch.
> >
> > I have run the test suite over RA-svn (with BDB) but haven't done any
> > more testing than that.
>
> Unless you built with SUSPICIOUSLY_HUGE_STRING_SIZE_THRESHOLD set to
> something less than 20, this may not have tested the "Large string" code
> path (when the loop loops).

Right. I've now tested that way (with threshold = 16 bytes) and
revealed a bug: I calculated the 'dest' address to read into before
calling stringbuf_ensure, but when the latter re-allocates memory it
moves the string so my 'dest' was wrong.

Fixed.

> > It's just something I noticed while looking at
> > that code, it's not solving a problem I've noticed or anything like
> > that. By inspection I believe it is more efficient than
> > read_long_string() was, and unchanged for short strings, but if you're
> > able to do any performance analysis on it, that would be great.
> >
> > - Julian
> >
>
> > Simplify and optimize RA-svn's string marshalling.
> >
> > This largely reverts r1028352 which merged r985606 and r1028092 from the
> > performance branch. That change used separate functions to handle short and
> > long strings. This change instead improves the original code to handle both
> > short and long strings efficiently.
> >
> > Compared with the pre-r1028352 code, this version uses a 1-MB chunk size
> > instead of 4-KB and eliminates one memcpy of the data.
> >
> > * subversion/libsvn_ra_svn/marshal.c
> > (read_long_string): Remove.
> > (read_string): Handle long strings with the same code as short strings, by
> > reading directly into pre-allocated space, but pre-allocate no more than
> > a reasonable chunk size at a time.
> > --This line, and those below, will be ignored--
> >
> > Index: subversion/libsvn_ra_svn/marshal.c
> > ===================================================================
> > --- subversion/libsvn_ra_svn/marshal.c (revision 1130931)
> > +++ subversion/libsvn_ra_svn/marshal.c (working copy)
> > @@ -563,36 +563,6 @@ svn_error_t *svn_ra_svn_write_tuple(svn_
> >
> > /* --- READING DATA ITEMS --- */
> >
> > -/* Read LEN bytes from CONN into a supposedly empty STRINGBUF.
> > - * POOL will be used for temporary allocations. */
> > -static svn_error_t *
> > -read_long_string(svn_ra_svn_conn_t *conn, apr_pool_t *pool,
> > - svn_stringbuf_t *stringbuf, apr_uint64_t len)
> > -{
> > - char readbuf[4096];
> > - apr_size_t readbuf_len;
> > -
> > - /* We can't store strings longer than the maximum size of apr_size_t,
> > - * so check for wrapping */
> > - if (((apr_size_t) len) < len)
>
> Why did you remove this check?

I was removing all this relatively new code and it wasn't there before,
and I wasn't convinced it was very important. But it would provide a
quick exit (instead of filling up all available memory) in some error
cases or potential DOS attempts so I'll re-instate it.

> > - return svn_error_create(SVN_ERR_RA_SVN_MALFORMED_DATA, NULL,
> > - _("String length larger than maximum"));
> > -
> > - while (len)
> > - {
> > - readbuf_len = len > sizeof(readbuf) ? sizeof(readbuf) : (apr_size_t)len;
> > -
> > - SVN_ERR(readbuf_read(conn, pool, readbuf, readbuf_len));
> > - /* Read into a stringbuf_t to so we don't allow the sender to allocate
> > - * an arbitrary amount of memory without actually sending us that much
> > - * data */
> > - svn_stringbuf_appendbytes(stringbuf, readbuf, readbuf_len);
> > - len -= readbuf_len;
> > - }
> > -
> > - return SVN_NO_ERROR;
> > -}
> > -
> > /* Read LEN bytes from CONN into already-allocated structure ITEM.
> > * Afterwards, *ITEM is of type 'SVN_RA_SVN_STRING', and its string
> > * data is allocated in POOL. */
> > @@ -601,42 +571,34 @@ static svn_error_t *read_string(svn_ra_s
> > {
[...]
> > + svn_stringbuf_ensure(stringbuf, stringbuf->len + readbuf_len);
>
> Need ->len + ->len + 1, for NUL. (Oddly, this API requires this
> off-by-one while _create_ensure() does it for you.)

Right; thanks. Fixed.

> > + SVN_ERR(readbuf_read(conn, pool, dest, readbuf_len));
> > +
> > + stringbuf->len += readbuf_len;
> > + stringbuf->data[stringbuf->len] = '\0';
> > + len -= readbuf_len;
>
> What happens if, at this point, there are less than READBUF_LEN bytes
> available?
>
> (e.g., if the client terminates the connection in the middle of
> a string literal.)

This part of the behaviour is unchanged from the original code. I don't
know what happens, but I imagine readbuf_read() returns an error,
perhaps SVN_ERR_RA_SVN_CONNECTION_CLOSED which I see readbuf_input() can
do.

The revised patch is attached.

- Julian

Received on 2011-06-06 16:45:49 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.