stefan2_at_apache.org wrote on Tue, Aug 17, 2010 at 19:04:21 -0000:
> Author: stefan2
> Date: Tue Aug 17 19:04:21 2010
> New Revision: 986453
>
> URL: http://svn.apache.org/viewvc?rev=986453&view=rev
> Log:
> Speed up EOF and keyword translation in two ways.
>
> First, don't translate EOFs if the "is" equals the "ought".
> This reduces the number of calls to translate_newline as
> well as translate_write and ultimately apr_file_write.
>
> Second, optimize the memcspn() replacement implementation
> by checking 4 bytes at once. That results in about 3 CPU
> ticks / byte instead of roughly 8 to 10 ticks before.
>
> * subversion/libsvn_delta/subst.c
> (translation_baton): add optimization info field
> (create_translation_baton): initialize the new member
> (eol_unchanged): new utiltiy function to compare EOLs
> (translate_chunk): faster scanning for interesting chars;
> don't translate unchaning EOLs
>
> Modified:
> subversion/branches/performance/subversion/libsvn_subr/subst.c
>
> Modified: subversion/branches/performance/subversion/libsvn_subr/subst.c
> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/subst.c?rev=986453&r1=986452&r2=986453&view=diff
> ==============================================================================
> --- subversion/branches/performance/subversion/libsvn_subr/subst.c (original)
> +++ subversion/branches/performance/subversion/libsvn_subr/subst.c Tue Aug 17 19:04:21 2010
> @@ -839,6 +844,33 @@ create_translation_baton(const char *eol
> return b;
> }
>
> +/* Return TRUE if the EOL starting at BUF matches the eol_str member of B.
> + * Be aware of special cases like "\n\r\n" and "\n\n\r". For sequences like
> + * "\n$", the result will be FALSE since it is more efficient to handle
> + * that special case implicitly in the calling code by exiting the quick
> + * scan loop.
> + */
> +static APR_INLINE svn_boolean_t
> +eol_unchanged(struct translation_baton *b,
> + const char *buf)
> +{
I've tweaked the docstring for a bit of extra clarity, in r1029092.
> + /* if the first byte doesn't match, the whole EOL won't */
> + if (buf[0] != b->eol_str[0])
> + return FALSE;
> +
> + /* two-char EOLs must be a full match */
> + if (b->eol_str_len == 2)
> + return buf[1] == b->eol_str[1];
> +
> + /* The first char matches the required 1-byte EOL.
> + * But maybe, buf[] contains a 2-byte EOL?
> + * In that case, the second byte will be interesting
> + * and not be another EOL of its own.
> + */
> + return !b->interesting[(unsigned char)buf[1]] || buf[0] == buf[1];
> +}
> +
So, this function returns TRUE if BUF starts with \n and we want \n, or
with \r\n and we want \r\n, or --- when we want \r --- starts with \r
and followed by neither \n nor $.
The single caller ensures that buf[1] is not EOF (and hence is safe to
use as an index into interesting[]), but the docstring should state
that (as a precondition).
> @@ -944,19 +976,71 @@ translate_chunk(svn_stream_t *dst,
> continue;
> }
>
> - /* We're in the boring state; look for interest characters. */
> - len = 0;
> + /* translate_newline will modify the baton for src_format_len==0
> + or may return an error if b->repair is FALSE. In all other
> + cases, we can skip the newline translation as long as source
> + EOL format and actual EOL format match. If there is a
> + mismatch, translate_newline will be called regardless of
> + nl_translation_skippable.
> + */
> + if (b->nl_translation_skippable == svn_tristate_unknown &&
> + b->src_format_len > 0)
> + {
> + /* test whether translate_newline may return an error */
> + if (b->eol_str_len == b->src_format_len &&
> + strncmp(b->eol_str, b->src_format, b->eol_str_len) == 0)
> + b->nl_translation_skippable = svn_tristate_true;
> + else if (b->repair)
> + b->nl_translation_skippable = svn_tristate_true;
> + else
> + b->nl_translation_skippable = svn_tristate_false;
> + }
> +
> + /* We're in the boring state; look for interest characters.
> + Offset len such that it will become 0 in the first iteration.
> + */
> + len = 0 - b->eol_str_len;
> +
> + /* Look for the next EOL (or $) that actually needs translation.
> + Stop there or at EOF, whichever is encountered first.
> + */
> + do
> + {
> + /* skip current EOL */
> + len += b->eol_str_len;
> +
> + /* check 4 bytes at once to allow for efficient pipilening
> + and to reduce loop condition overhead. */
> + while ((p + len + 4) <= end)
> + {
> + char sum = interesting[(unsigned char)p[len]]
> + | interesting[(unsigned char)p[len+1]]
> + | interesting[(unsigned char)p[len+2]]
> + | interesting[(unsigned char)p[len+3]];
> +
> + if (sum != 0)
> + break;
> +
> + len += 4;
> + }
> +
> + /* Found an interesting char or EOF in the next 4 bytes.
> + Find its exact position. */
> + while ((p + len) < end && !interesting[(unsigned char)p[len]])
> + ++len;
> + }
> + while (b->nl_translation_skippable && /* can skip EOLs at all */
> + p + len + 2 < end && /* not too close to EOF */
> + eol_unchanged (b, p + len)); /* EOL format already ok */
>
Haven't read the whole hunk yet, but this jumped to my eye:
Since nl_translation_skippable is an svn_tristate_t, shouldn't this test
explicitly for 'svn_tristate_true'? (Otherwise, the test would evaluate
to true even for svn_tristate_false...)
Perhaps we should name the variable in a way that indicates its
non-booleanness --- e.g., tri_nl_translation_skippable --- ?
> - /* We wanted memcspn(), but lacking that, the loop below has
> - the same effect. Also, skip NUL characters.
> - */
> while ((p + len) < end && !interesting[(unsigned char)p[len]])
> len++;
>
> if (len)
> - SVN_ERR(translate_write(dst, p, len));
> -
> - p += len;
> + {
> + SVN_ERR(translate_write(dst, p, len));
> + p += len;
> + }
>
> /* Set up state according to the interesting character, if any. */
> if (p < end)
>
>
Received on 2010-10-30 20:33:37 CEST