Julian Foad wrote:
> On Sun, 2010-04-25, Stefan Fuhrmann wrote:
>
>> here comes the second set of local optimizations.
>> It contains a number of simple improvements over
>> the original code.
>>
>
> Hi Stefan. Thanks for working on optimization. Saving a few cycles can
> be useful if it's in frequently-executed code so that the overall gain
> is significant. At the same time, we are trying to keep a good balance
> against code simplicity. Much of our code is more complex and longer
> than it should be already, so I want to be careful about any change that
> makes it even longer.
>
> Have you got (or please could you get) some measurements for these
> optimisations that you could show us? From just looking at the patch, I
> find it hard to believe that some of the changes make a consistent
> difference, at least from the perspective of thinking about different
> compilers. Specific comments and questions are below.
>
All this code is over 3 weeks old now. I simply sent
the Easter weekend optimizing SVN, APR and zlib.
The total performance gains are tremendous.
Now, I'm in the tedious process of slicing the changes
into smaller, easily digestible chunks. Once I got them
all posted, I plan to write an extensive analysis of the
current SVN performance including several measurements
for different OS, FS, patch levels, repository formats etc.
Bottom line: Subversion-sized projects could technically
be checked out in less than 1 second (wire-speed permitting).
> Also, in some cases you need to include a comment in the code saying
> "This is done this way because it is faster than doing it the simple
> way," otherwise the next person to edit the code will "improve" it back
> to the way it was before.
>
Good point. Will do.
>
>> [[[
>> Numerous local optimizations.
>>
>
> When you write these log messages, please could you write a few words
> about each change to explain why it's an improvement. They are all
> obvious to you, of course, but when the rest of us look at the patch,
> some of the changes are not obvious at first. It would be good if each
> us us could read through your patch and think to ourselves, "Yes, I
> agree with Stefan's reasoning for that change."
>
>
>
>> * subversion/include/svn_delta.h
>> (enum svn_delta_action): ensure that these definitions
>> will always have these values
>>
>
> I saw in a later part of the patch that this is so you can make the
> function 'decode_instruction()' rely on those values. As you know,
> adding the explicit "=0", "=1", "=2" on the enum definition does not
> make any difference to the compiler, it just provides a hint to the
> human developers that these values might be significant.
>
> We need a stronger hint than this. Please add code comments in both the
> enum definition and the function 'decode_instruction()', saying "The
> enum values must match the instruction encoding values."
>
Oh! I wasn't aware of that - being a C++ guy.
But as long as comments can "fix" that, so be it ... ;)
>
>> * subversion/libsvn_delta/compose_delta.c
>> (search_offset_index): use size_t to index memory
>>
>
> It looks like that is just for consistency with the other arguments. Or
> does it play a part in speed optimization as well?
>
Basically, it's a consistency improvement. However,
on LP64 and LLP64 systems, it saves a conversion
from int -> ptrdiff_t.
>
>> (copy_source_ops): dito; optimize a common case
>>
>
> This is optimizing use of the C 'modulo' operator ('A % B') by
> hand-coding B=1 and B=2 to use "& (B-1)" instead. Are you sure your C
> compiler doesn't already do this? My intuition expects the compiler to
> do it.
>
How would the compiler know that ptn_length is often
1 or 2? In the general case, such conditional code would
be a gross pessimization.
>
>> * subversion/libsvn_delta/svndiff.c
>> (decode_file_offset, decode_size): use slightly more
>> efficient formulations
>>
>
> In these two functions you are expanding two lines of expressions into
> many simpler lines, and introducing a temporary variable to eliminate
> multiple loads and stores.
>
> I am suspicious of this change. It is the compiler's job to transform
> an expression into a longer stream of simple statements, and often the
> compiler will do a better job than a human. (Yes I know hand-coded
> optimisations can sometimes be useful.)
>
Again, the compiler can't do much here: unless it summons
the daemons of global and feedback optimization (or the
user tells him not to care), there is no way to be *sure* that
p and val point to disjoint memory regions. Thus, it cannot
optimize the *val = f(*val) code.
My code saves 2 out of 3 memory accesses at the expense
of an additional write at the end.
> Also, these changes (and the 'modulo' change above) look like the sort
> of change that could generate faster code on one compiler but slower
> code on another.
>
I'm very conscious about the impact on other compilers
and architectures. Therefore, I restrict my extensive arsenal
of "coding tricks" to what should be beneficial on most
CPUs and is hard for compilers to "screw up".
> If you could show the numbers, that would help me/us to understand the
> trade-off.
>
>
These changes were added in the later stages of my
optimization session. At that point, the major parts had
already gotten streamlined leaving other sections where
runtime seeped through cracks all over the place.
Each of the changes "putties" one of the cracks I came
across. So, individual savings are in the .5 .. 2% range
but the total effect is somewhere between 3 and 10%
(depending on platform, compiler, etc.).
>> (decode_instruction): directly map action codes -
>> made possible by defining the enum values
>>
>> * subversion/libsvn_subr/checksum.c
>> (svn_checksum_parse_hex): eliminate CTR function calls
>>
>
> Did you mean 'CRT' (C Run-Time library)? If so, would
> svn_ctype_isxdigit() be useful, or does calling that have the same
> overhead as calling the CRT? It seems wrong to write out ASCII-hex
> conversion functions in long-hand code, and is the sort of code I would
> normally replace by a function call whenever I see it. At least make it
> a separate function or macro in the source file, with a comment
> explaining why the system 'isxdigit' or 'svn_ctype_isxdigit' is not
> being used.
>
>
My problem was that I couldn't see what CRT function
took *that* long. Parsing a checksum should be so rare
that it wouldn't even show up in the profiler. But it did.
Elsethread, isalpha() got blamed for it. I will change my
code using the feedback I got.
-- Stefan^2.
Received on 2010-04-27 02:11:50 CEST