[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r986453 - /subversion/branches/performance/subversion/libsvn_subr/subst.c

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Thu, 04 Nov 2010 00:56:25 +0100

On 30.10.2010 20:30, Daniel Shahaf wrote:
> stefan2_at_apache.org wrote on Tue, Aug 17, 2010 at 19:04:21 -0000:
>> Author: stefan2
>> Date: Tue Aug 17 19:04:21 2010
>> New Revision: 986453
>>
>> URL: http://svn.apache.org/viewvc?rev=986453&view=rev
>> Log:
>> Speed up EOF and keyword translation in two ways.
>>
>> First, don't translate EOFs if the "is" equals the "ought".
>> This reduces the number of calls to translate_newline as
>> well as translate_write and ultimately apr_file_write.
>>
>> Second, optimize the memcspn() replacement implementation
>> by checking 4 bytes at once. That results in about 3 CPU
>> ticks / byte instead of roughly 8 to 10 ticks before.
>>
>> * subversion/libsvn_delta/subst.c
>> (translation_baton): add optimization info field
>> (create_translation_baton): initialize the new member
>> (eol_unchanged): new utiltiy function to compare EOLs
>> (translate_chunk): faster scanning for interesting chars;
>> don't translate unchaning EOLs
>>
>> Modified:
>> subversion/branches/performance/subversion/libsvn_subr/subst.c
>>
>> Modified: subversion/branches/performance/subversion/libsvn_subr/subst.c
>> URL: http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/subst.c?rev=986453&r1=986452&r2=986453&view=diff
>> ==============================================================================
>> --- subversion/branches/performance/subversion/libsvn_subr/subst.c (original)
>> +++ subversion/branches/performance/subversion/libsvn_subr/subst.c Tue Aug 17 19:04:21 2010
>> @@ -839,6 +844,33 @@ create_translation_baton(const char *eol
>> return b;
>> }
>>
>> +/* Return TRUE if the EOL starting at BUF matches the eol_str member of B.
>> + * Be aware of special cases like "\n\r\n" and "\n\n\r". For sequences like
>> + * "\n$", the result will be FALSE since it is more efficient to handle
>> + * that special case implicitly in the calling code by exiting the quick
>> + * scan loop.
>> + */
>> +static APR_INLINE svn_boolean_t
>> +eol_unchanged(struct translation_baton *b,
>> + const char *buf)
>> +{
> I've tweaked the docstring for a bit of extra clarity, in r1029092.
Thanks.
>> + /* if the first byte doesn't match, the whole EOL won't */
>> + if (buf[0] != b->eol_str[0])
>> + return FALSE;
>> +
>> + /* two-char EOLs must be a full match */
>> + if (b->eol_str_len == 2)
>> + return buf[1] == b->eol_str[1];
>> +
>> + /* The first char matches the required 1-byte EOL.
>> + * But maybe, buf[] contains a 2-byte EOL?
>> + * In that case, the second byte will be interesting
>> + * and not be another EOL of its own.
>> + */
>> + return !b->interesting[(unsigned char)buf[1]] || buf[0] == buf[1];
>> +}
>> +
> So, this function returns TRUE if BUF starts with \n and we want \n, or
> with \r\n and we want \r\n, or --- when we want \r --- starts with \r
> and followed by neither \n nor $.
Yes. It also returns FALSE if we want \n but find \n\r
and returns TRUE if we found \n\n\r (not actually testing
for the last \r). In the last case we will soon be called
again for the buf+1.
> The single caller ensures that buf[1] is not EOF (and hence is safe to
> use as an index into interesting[]), but the docstring should state
> that (as a precondition).
Done in r1030763.
>> @@ -944,19 +976,71 @@ translate_chunk(svn_stream_t *dst,
>> continue;
>> }
>>
>> - /* We're in the boring state; look for interest characters. */
>> - len = 0;
>> + /* translate_newline will modify the baton for src_format_len==0
>> + or may return an error if b->repair is FALSE. In all other
>> + cases, we can skip the newline translation as long as source
>> + EOL format and actual EOL format match. If there is a
>> + mismatch, translate_newline will be called regardless of
>> + nl_translation_skippable.
>> + */
>> + if (b->nl_translation_skippable == svn_tristate_unknown&&
>> + b->src_format_len> 0)
>> + {
>> + /* test whether translate_newline may return an error */
>> + if (b->eol_str_len == b->src_format_len&&
>> + strncmp(b->eol_str, b->src_format, b->eol_str_len) == 0)
>> + b->nl_translation_skippable = svn_tristate_true;
>> + else if (b->repair)
>> + b->nl_translation_skippable = svn_tristate_true;
>> + else
>> + b->nl_translation_skippable = svn_tristate_false;
>> + }
>> +
>> + /* We're in the boring state; look for interest characters.
>> + Offset len such that it will become 0 in the first iteration.
>> + */
>> + len = 0 - b->eol_str_len;
>> +
>> + /* Look for the next EOL (or $) that actually needs translation.
>> + Stop there or at EOF, whichever is encountered first.
>> + */
>> + do
>> + {
>> + /* skip current EOL */
>> + len += b->eol_str_len;
>> +
>> + /* check 4 bytes at once to allow for efficient pipilening
>> + and to reduce loop condition overhead. */
>> + while ((p + len + 4)<= end)
>> + {
>> + char sum = interesting[(unsigned char)p[len]]
>> + | interesting[(unsigned char)p[len+1]]
>> + | interesting[(unsigned char)p[len+2]]
>> + | interesting[(unsigned char)p[len+3]];
>> +
>> + if (sum != 0)
>> + break;
>> +
>> + len += 4;
>> + }
>> +
>> + /* Found an interesting char or EOF in the next 4 bytes.
>> + Find its exact position. */
>> + while ((p + len)< end&& !interesting[(unsigned char)p[len]])
>> + ++len;
>> + }
>> + while (b->nl_translation_skippable&& /* can skip EOLs at all */
>> + p + len + 2< end&& /* not too close to EOF */
>> + eol_unchanged (b, p + len)); /* EOL format already ok */
>>
> Haven't read the whole hunk yet, but this jumped to my eye:
>
> Since nl_translation_skippable is an svn_tristate_t, shouldn't this test
> explicitly for 'svn_tristate_true'? (Otherwise, the test would evaluate
> to true even for svn_tristate_false...)
>
> Perhaps we should name the variable in a way that indicates its
> non-booleanness --- e.g., tri_nl_translation_skippable --- ?
Fixed in 1029335. But modifying the tristate enum
definition as proposed in a recent post would have
fixed this as well.

-- Stefan^2.
Received on 2010-11-04 00:57:23 CET

This is an archived mail posted to the Subversion Dev mailing list.