On Mon, Dec 9, 2013 at 11:01 AM, Branko Čibej <brane_at_wandisco.com> wrote:
> Hi Stefan,
>
> could you please shed some light on your "minor opimization" commit:
>
Minor optimization is meant to be a more general term:
Slightly less code and slightly faster. The first bit being
the main justification.
> > ------------------------------------------------------------------------
> > r1532186 | stefan2 | 2013-10-15 06:59:00 +0200 (Tue, 15 Oct 2013) | 5
> > lines Minor optimization in svn_membuf code. *
> > subversion/libsvn_subr/string.c (membuf_create): always allocate, even
> > 0 bytes (svn_membuf__resize): new allocated membuf->data can never be
> > NULL Index: subversion/libsvn_subr/string.c
>
> The result, as I see it, pessimises the code in many scenarios where
> membufs are used as a buffer whose size is unknown at creation time.
> True, we save some time at every resize. The time saved is never more
> than one clock cycle, and probably far less on average (with parallel
> issue); checking (membuf->data != 0) will take that much time once its
> value is loaded into a register, and your changed condition does not
> avoid the load.
>
> On the other hand, you pay for this "optimization" by always calling
> apr_palloc on membuf creation, even if the buffer size is 0 and will
> therefore probably be resized, and apr_palloc called again, almost
> immediately. I fail to see how you can justify this extra call to
> apr_palloc.
>
Welcome to the wonder that is the modern microprocessor.
Out of sheer curiosity, I ran callgrind's cache and branch predictor
simulation on SVN (don't remember the actual operation). membuf_t
came up as a branch misprediction hotspot.
Problem: Whenever we check for 0 / NULL in svn_membuf__resize,
the CPU must guess in advance what the outcome will be. Zero
costs when it gets it right, ~20 ticks penalty when it guess wrong.
Unless the code strictly follows a simple pattern ("always false", "always
true", "true, then false, then true, then false" etc.), there will be at
least
one misprediction: either from non-NULL to NULL or vice versa.
So, unless your pattern is really strict, the "abnormal" NULL case
will cause at least one 20 tick penalty.
OTOH, apr_palloc takes only about 10 ticks when allocating 0 bytes
(call, return and all other jumps in that function are highly predictable).
Not skipping that call is about twice as fast as skipping and stumbling.
To clarify, the most often used pattern where the initial membuf size os
> 0 is when normalizing UTF-8 strings, where we let the utf8proc code
> determine how large the allocation has to be, based on its analysis of
> the string; the only alternative is to allocate a far larger buffer than
> you can ever need, and incidentally making assumptions about how the
> normalization is implemented. The extra allocation you introduced here
> does not speed anything up; rather the opposite.
>
It is not an extra allocation. For 0 bytes we simply get a valid pointer
but the next allocation will return the same pointer. So, there is no waste.
If your pattern is truely fixed, "alloc 0, alloc final" and not "alloc 0,
alloc much, sometimes alloc more", the original code would be
slightly faster as explained above. For other uses of membuf_t,
the 0 size case is still an anomaly.
-- Stefan^2.
Received on 2013-12-22 14:17:21 CET