Re: apr_hash_overlay returns hash with duplicate keys

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Fri, 11 Dec 2015 00:00:01 +0100

On 10.12.2015 18:23, Ivan Zhakov wrote:
> On 10 December 2015 at 20:20, Julian Foad <julianfoad_at_apache.org> wrote:
>> Ivan Zhakov wrote:
>>> On 10 December 2015 at 19:14, Julian Foad <julianfoad_at_apache.org> wrote:
>>>> APR devs, Subversion devs:
>>>>
>>>> On Subversion's Mac OS buildbots it appears that apr_hash_overlay()
>>>> sometimes returns a hash containing duplicate keys, which (as I
>>>> understand it) should be impossible.
>>>>
>>>> We had an issue where some 'svnmover' tests were failing only on Mac
>>>> OS buildbots. I added some debugging in Subversion commits r1719056,
>>>> r1719067, r1719072, r1719074.
>>>>
>>>> Buildbot result:
>>>> https://ci.apache.org/builders/svn-x64-macosx-bdb/builds/485/steps/Test%20ra_svn%2Bbdb
>>>> --> debug output in 'faillog' shows duplicate keys in hash:
>>>> "union_children={A, iota, foo, boozle, boozle, iota}"
>>>>
>>>> I replaced apr_hash_overlay() with my own simple re-implementation:
>>>>
>>>> http://svn.apache.org/r1719089 -- re-implement hash overlay
>>>>
>>> Hi Julian,
>>>
>>> That could be possible if two hashes uses different hash functions.
>>> This could happen if you're using svn_hash__make() directly or
>>> indirectly: for example RA get_dirent for svn:// protocol returns hash
>>> with non-standard hash-function.
>>
>> Ugh. Yes, that is probably the cause. Thanks.
>>
>> I can see now that the doc string of apr_hash_overlay() does say "Both
>> hash tables must use the same hash function" but that was not obvious,
>> and I had totally forgotten that our Subversion code was using more
>> than one hash function.

That API restriction has been removed in 1.4.6.

When APR introduced "randomized" hash functions for
each hash instance, they could no longer simply reuse
the hash values across instances.

>> I will use my own hash overlay function from now on.

You could add a version check and call the APR
library function for newer APR releases.

> I don't think this is proper fix for this problem. I think returning
> hash with non-standard hash function from public API is a bug (and API
> regression).

We always return a standard hash. What is different
is that we don't instantiate it with the *default*
hash function.

> Other API users may get to the same situation. So proper
> fix would be revert these optimizations from public API imo.

It is up to the APR API user to use that API correctly.
For instance, there is no guarantee anywhere that any
two apr_hash_t instances use the same hash function.

Furthermore, it is up to the APR project to keep their
API documentation in sync with the code such that users
won't jump through hoops needlessly.

-- Stefan^2.
Received on 2015-12-10 23:59:32 CET

This message: [ Message body ]
Next message: Branko Čibej: "Re: apr_hash_overlay returns hash with duplicate keys"
Previous message: Ivan Zhakov: "Re: apr_hash_overlay returns hash with duplicate keys"
In reply to: Ivan Zhakov: "Re: apr_hash_overlay returns hash with duplicate keys"
Next in thread: Branko Čibej: "Re: apr_hash_overlay returns hash with duplicate keys"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]