On Sun, Dec 22, 2013 at 2:52 PM, Branko Čibej <brane_at_wandisco.com> wrote:
> On 22.12.2013 14:16, Stefan Fuhrmann wrote:
> > On Mon, Dec 9, 2013 at 11:01 AM, Branko Čibej <brane_at_wandisco.com
> > <mailto:brane_at_wandisco.com>> wrote:
> > To clarify, the most often used pattern where the initial membuf
> > size os
> > 0 is when normalizing UTF-8 strings, where we let the utf8proc code
> > determine how large the allocation has to be, based on its analysis
> > the string; the only alternative is to allocate a far larger
> > buffer than
> > you can ever need, and incidentally making assumptions about how the
> > normalization is implemented. The extra allocation you introduced
> > does not speed anything up; rather the opposite.
> > It is not an extra allocation. For 0 bytes we simply get a valid pointer
> > but the next allocation will return the same pointer. So, there is no
> > waste.
[Last post to this topic as this is *really* a minor change.]
How on earth do you know that? Do you have a crystal ball that tells you
> that there will be no intervening allocations from the same pool?
Even if the active block in the pool has been completely
allocated (zero free memory), allocating 0 extra bytes is
for free in the current implementation.
> another one that tells you what will happen to APR's pool implementation
> in some future version?
I obviously can't tell - except that a major point of the APR
pool design is to be space efficient at the cost of being
unable to de-alloc selectively. If it ever were to add some
per-allocation overhead, its size will still be small relative
to the actual data buffer size.
> (On the other note about apr_palloc taking less time than a mispredicted
> conditional jump ... you're assuming that the apr_palloc code is in the
> L1 instruction cache,
Which it will be in most cases. If it is not, the initial allocation
will prime L1I for the following re-alloc. In general, SVN has
quite high L1I hit rates, i.e. high temporal code locality.
> and you're assuming that everyone uses Intel Core
apr_palloc latency is dominated by L1D latency. The latter
is usually subject to the same design forces than pipeline
depth. Even for embedded PPC, 2xL1D latency <= branch
> — and that everyone uses the same compiler you do. None of
> the above is likely to be true, in general.)
Well, with a good compiler, constant propagation will make
the old special-cased membuf_create() than the new one
calling apr_palloc (even if the latter gets a constant prop
code variant as well). The resize code is the place where
we can skip a NULL check.
Received on 2013-12-31 17:25:52 CET