On 16 May 2015 at 07:48, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> wrote:
> On Fri, May 15, 2015 at 7:25 PM, Philip Martin <philip.martin_at_wandisco.com>
> wrote:
>>
>> Philip Martin <philip.martin_at_wandisco.com> writes:
>>
>> > Philip Martin <philip.martin_at_wandisco.com> writes:
>> >
>> >> Prompted by the warnings I think there are some issues to fix. For
>> >> APR_HASH_KEY_STRING keys there is no protection against abnormally long
>> >> keys. combined_long_key() will allocate strlen() memory even if it is
>> >> many GB. The item will not get cached if key+data length is more than
>> >> 4GB but the memory for the key, which could be more than 4GB, will be
>> >> permanently allocated in the cache. There is also a problem with
>> >> overflow in membuffer_cache_set_internal() when calculating key+data
>> >> length, although in practice a key large enough to trigger this will
>> >> probably fail memory allocation first.
>> >
>> > Another issue:
>> >
>> > static svn_boolean_t
>> > entry_keys_match(const entry_key_t *lhs,
>> > const entry_key_t *rhs)
>> > {
>> > return (lhs->fingerprint[0] == rhs->fingerprint[0])
>> > && (lhs->fingerprint[1] == rhs->fingerprint[1])
>> > && (lhs->key_len == rhs->key_len);
>> > }
>> >
>> > I think the key_len comparison is wrong and should be removed. If two
>> > keys have fingerprints that collide it does not mean the key lengths are
>> > the same. When attempting to use two keys with colliding fingerprints
>> > the behaviour when the key lengths vary should be same as when the key
>> > lengths are the same and the key data varies.
>
>
> Not necessary. It is true that we could decide to *only* use
> the fingerprint as discriminator. But the implementation here
> includes the key length; it reduces the number of conflicts
> that result in entries that evict each other.
>
> Most fixed-length keys are 16 bytes long and for them, the
> fingerprint is almost identical to the key and conflicts are
> extremely unlikely. So, varible-length keys are the one that
> may have fingerprint conflicts now due to the weakness of
> FNV1. For them, however, the key_len (often derived from
> paths) has a good chance of being different.
>
> Note that if the key_len differs, it *cannot* be the same key.
> Hence, including the key_len does not create false negatives.
>
+1.
>> Another issue: find_entry() now calls drop_entry() in more cases and can
>> now call it when find_empty==FALSE during read operations. On Unix when
>> using the read-write lock this means the cache gets modified while only
>> holding a read lock, not a write lock, and that can corrupt the cache.
>>
As far I understand find_entry(find_empty == TRUE) requires write-lock
for cache, while only read-lock is required for find_empty == FALSE.
Is correct? It would be nice to document this in docstring.
>> If we use read-write locks then when read detects a fingerprint
>> collision it cannot modify the cache to clear the collision, it will
>> have to return NULL and leave the entry in the cache.
>
>
> You are absolutely right. Fixed in r1679681.
>
PS: Just to make things clear: branch has been changed so my +1
doesn't apply to it.
--
Ivan Zhakov
Received on 2015-05-16 19:46:02 CEST