On Sun, 2010-04-25, Stefan Fuhrmann wrote:
> here comes the second set of local optimizations.
> It contains a number of simple improvements over
> the original code.
Hi Stefan. Thanks for working on optimization. Saving a few cycles can
be useful if it's in frequently-executed code so that the overall gain
is significant. At the same time, we are trying to keep a good balance
against code simplicity. Much of our code is more complex and longer
than it should be already, so I want to be careful about any change that
makes it even longer.
Have you got (or please could you get) some measurements for these
optimisations that you could show us? From just looking at the patch, I
find it hard to believe that some of the changes make a consistent
difference, at least from the perspective of thinking about different
compilers. Specific comments and questions are below.
Also, in some cases you need to include a comment in the code saying
"This is done this way because it is faster than doing it the simple
way," otherwise the next person to edit the code will "improve" it back
to the way it was before.
> [[[
> Numerous local optimizations.
When you write these log messages, please could you write a few words
about each change to explain why it's an improvement. They are all
obvious to you, of course, but when the rest of us look at the patch,
some of the changes are not obvious at first. It would be good if each
us us could read through your patch and think to ourselves, "Yes, I
agree with Stefan's reasoning for that change."
> * subversion/include/svn_delta.h
> (enum svn_delta_action): ensure that these definitions
> will always have these values
I saw in a later part of the patch that this is so you can make the
function 'decode_instruction()' rely on those values. As you know,
adding the explicit "=0", "=1", "=2" on the enum definition does not
make any difference to the compiler, it just provides a hint to the
human developers that these values might be significant.
We need a stronger hint than this. Please add code comments in both the
enum definition and the function 'decode_instruction()', saying "The
enum values must match the instruction encoding values."
> * subversion/libsvn_delta/compose_delta.c
> (search_offset_index): use size_t to index memory
It looks like that is just for consistency with the other arguments. Or
does it play a part in speed optimization as well?
> (copy_source_ops): dito; optimize a common case
This is optimizing use of the C 'modulo' operator ('A % B') by
hand-coding B=1 and B=2 to use "& (B-1)" instead. Are you sure your C
compiler doesn't already do this? My intuition expects the compiler to
do it.
> * subversion/libsvn_delta/svndiff.c
> (decode_file_offset, decode_size): use slightly more
> efficient formulations
In these two functions you are expanding two lines of expressions into
many simpler lines, and introducing a temporary variable to eliminate
multiple loads and stores.
I am suspicious of this change. It is the compiler's job to transform
an expression into a longer stream of simple statements, and often the
compiler will do a better job than a human. (Yes I know hand-coded
optimisations can sometimes be useful.)
Also, these changes (and the 'modulo' change above) look like the sort
of change that could generate faster code on one compiler but slower
code on another.
If you could show the numbers, that would help me/us to understand the
trade-off.
> (decode_instruction): directly map action codes -
> made possible by defining the enum values
>
> * subversion/libsvn_subr/checksum.c
> (svn_checksum_parse_hex): eliminate CTR function calls
Did you mean 'CRT' (C Run-Time library)? If so, would
svn_ctype_isxdigit() be useful, or does calling that have the same
overhead as calling the CRT? It seems wrong to write out ASCII-hex
conversion functions in long-hand code, and is the sort of code I would
normally replace by a function call whenever I see it. At least make it
a separate function or macro in the source file, with a comment
explaining why the system 'isxdigit' or 'svn_ctype_isxdigit' is not
being used.
> * subversion/libsvn_subr/stream.c
> (stream_readline): numbytes is invariant until EOF
> ]]]
> plain text document attachment (PerformanceNumerics.patch):
> Index: subversion/include/svn_delta.h
> ===================================================================
> --- subversion/include/svn_delta.h (revision 937673)
> +++ subversion/include/svn_delta.h (working copy)
> @@ -100,7 +100,7 @@
> * It must be the case that 0 <= @a offset < @a offset +
> * @a length <= size of source view.
> */
> - svn_txdelta_source,
> + svn_txdelta_source = 0,
>
> /** Append the @a length bytes at @a offset in the target view, to the
> * target.
> @@ -119,7 +119,7 @@
> * useful in encoding long runs of consecutive characters, long runs
> * of CR/LF pairs, etc.
> */
> - svn_txdelta_target,
> + svn_txdelta_target = 1,
>
> /** Append the @a length bytes at @a offset in the window's @a new string
> * to the target.
> @@ -129,7 +129,7 @@
> * order with no overlap at the moment; svn_txdelta_to_svndiff()
> * depends on this.
> */
> - svn_txdelta_new
> + svn_txdelta_new = 2
> };
>
> /** A single text delta instruction. */
> Index: subversion/libsvn_delta/compose_delta.c
> ===================================================================
> --- subversion/libsvn_delta/compose_delta.c (revision 937673)
> +++ subversion/libsvn_delta/compose_delta.c (working copy)
> @@ -165,10 +165,10 @@
> as hint because most lookups come as a sequence of decreasing values
> for OFFSET and they concentrate on the lower end of the array. */
>
> -static int
> -search_offset_index(const offset_index_t *ndx, apr_size_t offset, int hint)
> +static apr_size_t
> +search_offset_index(const offset_index_t *ndx, apr_size_t offset, apr_size_t hint)
> {
> - int lo, hi, op;
> + apr_size_t lo, hi, op;
>
> assert(offset < ndx->offs[ndx->length]);
>
> @@ -635,13 +635,13 @@
> static void
> copy_source_ops(apr_size_t offset, apr_size_t limit,
> apr_size_t target_offset,
> - int hint,
> + apr_size_t hint,
> svn_txdelta__ops_baton_t *build_baton,
> const svn_txdelta_window_t *window,
> const offset_index_t *ndx,
> apr_pool_t *pool)
> {
> - int op_ndx = search_offset_index(ndx, offset, hint);
> + apr_size_t op_ndx = search_offset_index(ndx, offset, hint);
> for (;; ++op_ndx)
> {
> const svn_txdelta_op_t *const op = &window->ops[op_ndx];
> @@ -694,7 +694,10 @@
> The idea here is to transpose the pattern, then generate
> another overlapping copy. */
> const apr_size_t ptn_length = off[0] - op->offset;
> - const apr_size_t ptn_overlap = fix_offset % ptn_length;
> + const apr_size_t ptn_overlap = ptn_length > 2
> + ? fix_offset % ptn_length
> + : fix_offset & (ptn_length-1);
> +
> apr_size_t fix_off = fix_offset;
> apr_size_t tgt_off = target_offset;
> assert(ptn_length > ptn_overlap);
> Index: subversion/libsvn_delta/svndiff.c
> ===================================================================
> --- subversion/libsvn_delta/svndiff.c (revision 937673)
> +++ subversion/libsvn_delta/svndiff.c (working copy)
> @@ -361,16 +361,24 @@
> const unsigned char *p,
> const unsigned char *end)
> {
> + svn_filesize_t temp = 0;
> if (p + MAX_ENCODED_INT_LEN < end)
> end = p + MAX_ENCODED_INT_LEN;
> /* Decode bytes until we're done. */
> - *val = 0;
> while (p < end)
> {
> - *val = (*val << 7) | (*p & 0x7f);
> - if (((*p++ >> 7) & 0x1) == 0)
> + unsigned c = *p;
Here 'c' is 'unsigned' meaning 'unsigned int'. (Please write 'unsigned
int' in full if that's intended.) Shouldn't it be 'unsigned char', or
do you think using an int is faster? Or, as in the next change below,
did you want it to match the type of 'temp' - in this case
svn_filesize_t?
> + temp = (temp << 7) | (c & 0x7f);
> + ++p;
> +
> + if (c < 0x80)
> + {
> + *val = temp;
> return p;
> + }
> }
> +
> + *val = temp;
> return NULL;
> }
>
> @@ -382,16 +390,24 @@
> const unsigned char *p,
> const unsigned char *end)
> {
> + apr_size_t temp = 0;
Style nit: please put a blank line after the declarations block, here
and after declaring 'c' below, and also in the function above.
> if (p + MAX_ENCODED_INT_LEN < end)
> end = p + MAX_ENCODED_INT_LEN;
> /* Decode bytes until we're done. */
> - *val = 0;
> while (p < end)
> {
> - *val = (*val << 7) | (*p & 0x7f);
> - if (((*p++ >> 7) & 0x1) == 0)
> + apr_size_t c = *p;
(By contrast, here 'c' is 'apr_size_t'.)
> + temp = (temp << 7) | (c & 0x7f);
> + ++p;
> +
> + if (c < 0x80)
> + {
> + *val = temp;
> return p;
> + }
> }
> +
> + *val = temp;
> return NULL;
> }
>
> @@ -458,27 +474,30 @@
> const unsigned char *p,
> const unsigned char *end)
> {
> + apr_size_t c;
> + apr_size_t action;
> +
> if (p == end)
> return NULL;
>
> /* Decode the instruction selector. */
> - switch ((*p >> 6) & 0x3)
> - {
> - case 0x0: op->action_code = svn_txdelta_source; break;
> - case 0x1: op->action_code = svn_txdelta_target; break;
> - case 0x2: op->action_code = svn_txdelta_new; break;
> - case 0x3: return NULL;
> - }
> + c = *p;
> + p++;
Write "c = *p++". (Try not to make the code more long-winded than
necessary.)
> + action = (c >> 6) & 0x3;
> + if (action >= 0x3)
> + return NULL;
>
> + op->action_code = (enum svn_delta_action)(action);
> +
> /* Decode the length and offset. */
> - op->length = *p++ & 0x3f;
> + op->length = c & 0x3f;
> if (op->length == 0)
> {
> p = decode_size(&op->length, p, end);
> if (p == NULL)
> return NULL;
> }
> - if (op->action_code != svn_txdelta_new)
> + if (action != svn_txdelta_new)
> {
> p = decode_size(&op->offset, p, end);
> if (p == NULL)
> Index: subversion/libsvn_subr/checksum.c
> ===================================================================
> --- subversion/libsvn_subr/checksum.c (revision 937673)
> +++ subversion/libsvn_subr/checksum.c (working copy)
> @@ -206,12 +206,39 @@
>
> for (i = 0; i < len; i++)
> {
> - if ((! isxdigit(hex[i * 2])) || (! isxdigit(hex[i * 2 + 1])))
> + char c1 = hex[i * 2];
> + char c2 = hex[i * 2 + 1];
> +
> + svn_boolean_t bad = FALSE;
> + unsigned char upper;
> + unsigned char lower;
> +
> + if (c1 > '9')
> + {
> + bad = (c1 > 'f') || (c1 < 'a');
> + upper = c1 - 'a' + 10;
> + }
> + else
> + {
> + bad = c1 < '0';
> + upper = c1 - '0';
> + }
> +
> + if (c2 > '9')
> + {
> + bad |= (c2 > 'f') || (c2 < 'a');
> + lower = c2 - 'a' + 10;
> + }
> + else
> + {
> + bad |= c2 < '0';
> + lower = c2 - '0';
> + }
> +
> + if (bad)
> return svn_error_create(SVN_ERR_BAD_CHECKSUM_PARSE, NULL, NULL);
>
> - ((unsigned char *)(*checksum)->digest)[i] =
> - (( isalpha(hex[i*2]) ? hex[i*2] - 'a' + 10 : hex[i*2] - '0') << 4) |
> - ( isalpha(hex[i*2+1]) ? hex[i*2+1] - 'a' + 10 : hex[i*2+1] - '0');
> + ((unsigned char *)(*checksum)->digest)[i] = (upper << 4) | lower;
> is_zeros |= (*checksum)->digest[i];
> }
>
> Index: subversion/libsvn_subr/stream.c
> ===================================================================
> --- subversion/libsvn_subr/stream.c (revision 937673)
> +++ subversion/libsvn_subr/stream.c (working copy)
> @@ -355,9 +355,9 @@
>
> /* Read into STR up to and including the next EOL sequence. */
> match = eol_str;
> + numbytes = 1;
> while (*match)
> {
> - numbytes = 1;
'numbytes' is indeed invariant until the end of the loop, but this
change is surely in the category of micro-optimisation. Unless you show
me a significant measured gain, we should instead move the declaration
of 'numbytes' *into* the scope of this block, for code readability.
Then we should re-design the whole "read line" approach to avoid the
need for one-character-at-a-time calls to the stream's "read" function.
(Such a re-design would require API changes.)
> SVN_ERR(svn_stream_read(stream, &c, &numbytes));
> if (numbytes != 1)
> {
>
- Julian
Received on 2010-04-26 12:03:33 CEST