Re: svn commit: r986453 - /subversion/branches/performance/subversion/libsvn_subr/subst.c

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sat, 30 Oct 2010 20:30:45 +0200

stefan2_at_apache.org wrote on Tue, Aug 17, 2010 at 19:04:21 -0000:
> Author: stefan2
> Date: Tue Aug 17 19:04:21 2010
> New Revision: 986453
>
> URL: http://svn.apache.org/viewvc?rev=986453&view=rev
> Log:
> Speed up EOF and keyword translation in two ways.
>
> First, don't translate EOFs if the "is" equals the "ought".
> This reduces the number of calls to translate_newline as
> well as translate_write and ultimately apr_file_write.
>
> Second, optimize the memcspn() replacement implementation
> by checking 4 bytes at once. That results in about 3 CPU
> ticks / byte instead of roughly 8 to 10 ticks before.
>
> * subversion/libsvn_delta/subst.c
> (translation_baton): add optimization info field
> (create_translation_baton): initialize the new member
> (eol_unchanged): new utiltiy function to compare EOLs
> (translate_chunk): faster scanning for interesting chars;
> don't translate unchaning EOLs
>
> Modified:
> subversion/branches/performance/subversion/libsvn_subr/subst.c
>
> Modified: subversion/branches/performance/subversion/libsvn_subr/subst.c
> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/subst.c?rev=986453&r1=986452&r2=986453&view=diff
> ==============================================================================
> --- subversion/branches/performance/subversion/libsvn_subr/subst.c (original)
> +++ subversion/branches/performance/subversion/libsvn_subr/subst.c Tue Aug 17 19:04:21 2010
> @@ -839,6 +844,33 @@ create_translation_baton(const char *eol
> return b;
> }
>
> +/* Return TRUE if the EOL starting at BUF matches the eol_str member of B.
> + * Be aware of special cases like "\n\r\n" and "\n\n\r". For sequences like
> + * "\n$", the result will be FALSE since it is more efficient to handle
> + * that special case implicitly in the calling code by exiting the quick
> + * scan loop.
> + */
> +static APR_INLINE svn_boolean_t
> +eol_unchanged(struct translation_baton *b,
> + const char *buf)
> +{

I've tweaked the docstring for a bit of extra clarity, in r1029092.

> + /* if the first byte doesn't match, the whole EOL won't */
> + if (buf[0] != b->eol_str[0])
> + return FALSE;
> +
> + /* two-char EOLs must be a full match */
> + if (b->eol_str_len == 2)
> + return buf[1] == b->eol_str[1];
> +
> + /* The first char matches the required 1-byte EOL.
> + * But maybe, buf[] contains a 2-byte EOL?
> + * In that case, the second byte will be interesting
> + * and not be another EOL of its own.
> + */
> + return !b->interesting[(unsigned char)buf[1]] || buf[0] == buf[1];
> +}
> +

So, this function returns TRUE if BUF starts with \n and we want \n, or
with \r\n and we want \r\n, or --- when we want \r --- starts with \r
and followed by neither \n nor $.

The single caller ensures that buf[1] is not EOF (and hence is safe to
use as an index into interesting[]), but the docstring should state
that (as a precondition).

>
>
>
>
> -
> -
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
>
@@ -944,19 +976,71 @@ translate_chunk(svn_stream_t *dst, continue; } /* We're in the boring state; look for interest characters. */ len = 0; /* translate_newline will modify the baton for src_format_len==0 or may return an error if b->repair is FALSE. In all other cases, we can skip the newline translation as long as source EOL format and actual EOL format match. If there is a mismatch, translate_newline will be called regardless of nl_translation_skippable. */ if (b->nl_translation_skippable == svn_tristate_unknown && b->src_format_len > 0) { /* test whether translate_newline may return an error */ if (b->eol_str_len == b->src_format_len && strncmp(b->eol_str, b->src_format, b->eol_str_len) == 0) b->nl_translation_skippable = svn_tristate_true; else if (b->repair) b->nl_translation_skippable = svn_tristate_true; else b->nl_translation_skippable = svn_tristate_false; } /* We're in the boring state; look for interest characters. Offset len such that it will become 0 in the first iteration. */ len = 0 - b->eol_str_len; /* Look for the next EOL (or $) that actually needs translation. Stop there or at EOF, whichever is encountered first. */ do { /* skip current EOL */ len += b->eol_str_len; /* check 4 bytes at once to allow for efficient pipilening and to reduce loop condition overhead. */ while ((p + len + 4) <= end) { char sum = interesting[(unsigned char)p[len]] | interesting[(unsigned char)p[len+1]] | interesting[(unsigned char)p[len+2]] | interesting[(unsigned char)p[len+3]]; if (sum != 0) break; len += 4; } /* Found an interesting char or EOF in the next 4 bytes. Find its exact position. */ while ((p + len) < end && !interesting[(unsigned char)p[len]]) ++len; } while (b->nl_translation_skippable && /* can skip EOLs at all */ p + len + 2 < end && /* not too close to EOF */ eol_unchanged (b, p + len)); /* EOL format already ok */

Haven't read the whole hunk yet, but this jumped to my eye:

Since nl_translation_skippable is an svn_tristate_t, shouldn't this test
explicitly for 'svn_tristate_true'? (Otherwise, the test would evaluate
to true even for svn_tristate_false...)

Perhaps we should name the variable in a way that indicates its
non-booleanness --- e.g., tri_nl_translation_skippable --- ?

> - /* We wanted memcspn(), but lacking that, the loop below has
> - the same effect. Also, skip NUL characters.
> - */
> while ((p + len) < end && !interesting[(unsigned char)p[len]])
> len++;
>
> if (len)
> - SVN_ERR(translate_write(dst, p, len));
> -
> - p += len;
> + {
> + SVN_ERR(translate_write(dst, p, len));
> + p += len;
> + }
>
> /* Set up state according to the interesting character, if any. */
> if (p < end)
>
>
Received on 2010-10-30 20:33:37 CEST

This message: [ Message body ]
Next message: Peter Samuelson: "Re: SQLite and SELECT WHERE local_relpath LIKE"
Previous message: Daniel Shahaf: "Does at least one buildbot run the Python tests over BDB?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]