Stefan Fuhrmann wrote:
> Julian Foad wrote:
> > I saw in a later part of the patch that this is so you can make the
> > function 'decode_instruction()' rely on those values. As you know,
> > adding the explicit "=0", "=1", "=2" on the enum definition does not
> > make any difference to the compiler, it just provides a hint to the
> > human developers that these values might be significant.
> >
> > We need a stronger hint than this. Please add code comments in both the
> > enum definition and the function 'decode_instruction()', saying "The
> > enum values must match the instruction encoding values."
> >
> Oh! I wasn't aware of that - being a C++ guy.
You may have misunderstood me. The rules are the same in C and C++.
What I meant is that the compiler will assign the values { 0, 1, 2 }
regardless of whether they are specified explicitly. It only makes a
difference when the next human maintainer wants to insert another value
in the list ... and then the human sees the numbers and thinks "Hmm, if
I need to insert a new value, perhaps I should make sure I don't
re-number the old values when I insert this new value."
> But as long as comments can "fix" that, so be it ... ;)
[...]
> >> (copy_source_ops): dito; optimize a common case
> >
> > This is optimizing use of the C 'modulo' operator ('A % B') by
> > hand-coding B=1 and B=2 to use "& (B-1)" instead. Are you sure your C
> > compiler doesn't already do this? My intuition expects the compiler to
> > do it.
> >
> How would the compiler know that ptn_length is often
> 1 or 2? In the general case, such conditional code would
> be a gross pessimization.
My intuition may well be wrong. My reasoning was that testing for 1 or
2 is very quick, and general case modulo calculation is very slow.
Therefore I think adding a test to every modulo calculation cannot be a
gross pessimization, just a small overhead. But if typical compilers
don't do it, then your manual optimization is useful.
> >> * subversion/libsvn_delta/svndiff.c
> >> (decode_file_offset, decode_size): use slightly more
> >> efficient formulations
> >
> > In these two functions you are expanding two lines of expressions into
> > many simpler lines, and introducing a temporary variable to eliminate
> > multiple loads and stores.
> >
> > I am suspicious of this change. It is the compiler's job to transform
> > an expression into a longer stream of simple statements, and often the
> > compiler will do a better job than a human. (Yes I know hand-coded
> > optimisations can sometimes be useful.)
> >
> Again, the compiler can't do much here: unless it summons
> the daemons of global and feedback optimization (or the
> user tells him not to care), there is no way to be *sure* that
> p and val point to disjoint memory regions. Thus, it cannot
> optimize the *val = f(*val) code.
>
> My code saves 2 out of 3 memory accesses at the expense
> of an additional write at the end.
OK, so the temporary variable is useful. Is splitting the expression up
into many small statements also useful?
> > Also, these changes (and the 'modulo' change above) look like the sort
> > of change that could generate faster code on one compiler but slower
> > code on another.
> >
> I'm very conscious about the impact on other compilers
> and architectures. Therefore, I restrict my extensive arsenal
> of "coding tricks" to what should be beneficial on most
> CPUs and is hard for compilers to "screw up".
Thanks.
> > If you could show the numbers, that would help me/us to understand the
> > trade-off.
> >
> These changes were added in the later stages of my
> optimization session. At that point, the major parts had
> already gotten streamlined leaving other sections where
> runtime seeped through cracks all over the place.
>
> Each of the changes "putties" one of the cracks I came
> across. So, individual savings are in the .5 .. 2% range
> but the total effect is somewhere between 3 and 10%
> (depending on platform, compiler, etc.).
Thanks. My problem is I find it hard to believe that some of these
changes
- Julian
Received on 2010-04-27 12:06:56 CEST