[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Using svn_hash__make instead of apr_hash__make

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Mon, 04 Jun 2012 12:15:28 +0200

Justin Erenkrantz wrote:
> On Tue, May 22, 2012 at 12:35 PM, Stefan Fuhrmann<eqfox_at_web.de> wrote:
>> On a more general note: We don't use hashes as a means to
>> randomize our data. For us, they are simply containers with
>> an average O(1) insertion and lookup behavior. The APR interface
>> also allows for iterating that container - so it *has* an ordering
>> and it has been quite stable under different operations and
>> over many versions of APR.
> Iterating isn't the same as ordering.

You are right. I got carried away ...

>>> And, the third point doesn't make any sense to me without a further
>>> explanation. -- justin
>>>
>> When we e.g. do an "svn ls -v" (TSVN repo browser), we will
>> create and fill the revprop hash for the respective revision
>> multiple times for each entry in the directory - just to produce
>> a few bytes of output. The hash function showed up in profiles
>> with 10% of the total runtime.
> Sorry - I'm not following how having a sorted hash table achieves
> this; it sounds like reuse of a data structure (not requiring memory
> allocs or something) - not that it is guaranteed to be stable on every
> call.

This is not linked to the "stable order" part but to performance
(hashfunc_compatible is a bit faster than the APR default).
>> So, I tuned that. Because apr_hash_t is so popular in our code,
>> that very localized improvement will give (small) returns in
>> improved performance all over the place.
> Yes, but there may very well be penalties in other places that you
> haven't looked at yet. I know that I've seen shift+incr appear in
> traces before - switching to a much more expensive hash algorithm is
> going to make those cases worse. -- justin
>
I'm not switching to a much more expensive hash algorithm
(if that is what you are implying). And I'm not suggesting that
APR switched to a different function either.

However, I think that APR could detect the "huge bucket"
case with very little overhead - just check / update a counter
on the respective bucket. Once it detected the critical scenario,
it should switch that the data structure for the respective
bucket from a linked list to e.g. a b-tree.

-- Stefan^2.
Received on 2012-06-04 12:16:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.