On Thu, Jun 20, 2013 at 10:03 PM, Mattias Engdegård
> 20 jun 2013 kl. 21.44 skrev Stefan Fuhrmann:
> A capable compiler should unroll the inner loop
>> such that we end up with ~10 cycles / 4 bytes.
>> That would be slightly faster than the "* 33" loop.
> That depends on a lot of things (such as the latency/throughput of the
Of curse. I checked the specs and SPARC turns out to be
2-issue, with T4 being OOO. And I simply assume that it
can handle one multiplication every 10 cycles ;) I'm more
worried about the compiler not being aggressive enough.
By the way, the new inner loop suffers from signed overflow (undefined
> behaviour), and also sign extension when char is signed (which it is on
> SPARC). Both need to be fixed.
Good catch. There is a similar issue with the "*33" loop,
although in practice both should simply produce worse
hash distributions than necessary. Fixed in r1495204.
I had preferred the other patch for its simplicity.
>> However, I'm fine with the current one and voted
>> for its backport to 1.8.x. It gives us target-independent
>> cache behavior - which is a good thing.
> No it doesn't. The code already produced different hashes on x86 and ppc
> because of differences in byte order.
You are right. I was not precise here: I meant SPARC uses
the same hash x86 now.
After some IRC discussion, we added r1495209 which
provides actual platform-independence.
Received on 2013-06-21 00:33:19 CEST