Hi Bert,
You are right. I tweaked it a bit and now we should get
pretty good coverage.
But even without that change, the collision rates that
I observed, monitoring the retries in APR, went up only
mildly and cost only a tiny fraction of what had been
was saved.
We use hashes for revprops and path names. For those,
you might almost just pick the 5th last byte as your hash
key (or something silly like that) and be fine.
-- Stefan^2
On Thu, May 16, 2013 at 6:31 PM, Bert Huijben <bert_at_qqmail.nl> wrote:
>
>
> > -----Original Message-----
> > From: stefan2_at_apache.org [mailto:stefan2_at_apache.org]
> > Sent: donderdag 16 mei 2013 18:15
> > To: commits_at_subversion.apache.org
> > Subject: svn commit: r1483434 -
> > /subversion/trunk/subversion/libsvn_subr/hash.c
> >
> > Author: stefan2
> > Date: Thu May 16 16:15:28 2013
> > New Revision: 1483434
> >
> > URL: http://svn.apache.org/r1483434
> > Log:
> > On machines that allow for unaligned access, speed up hash ops
> > with known key length at the expense of those where the key length
> > is unknown.
> >
> > Background: Efficient hash functions computation is essential for
> > performance in many parts of SVN. Many of the frequently callers
> > them have already been changed to provide the key length in.
> >
> > * subversion/libsvn_subr/hash.c
> > (hashfunc_compatible): use a slightly different but faster
> > formula when the CPU supports that
> >
> > Modified:
> > subversion/trunk/subversion/libsvn_subr/hash.c
> >
> > Modified: subversion/trunk/subversion/libsvn_subr/hash.c
> > URL:
> > http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/ha
> > sh.c?rev=1483434&r1=1483433&r2=1483434&view=diff
> > ==========================================================
> > ====================
> > --- subversion/trunk/subversion/libsvn_subr/hash.c (original)
> > +++ subversion/trunk/subversion/libsvn_subr/hash.c Thu May 16 16:15:28
> > 2013
> > @@ -600,37 +600,26 @@ hashfunc_compatible(const char *char_key
> > apr_ssize_t i;
> >
> > if (*klen == APR_HASH_KEY_STRING)
> > - {
> > - for (p = key; ; p+=4)
> > - {
> > - unsigned int new_hash = hash * 33 * 33 * 33 * 33;
> > - if (!p[0]) break;
> > - new_hash += p[0] * 33 * 33 * 33;
> > - if (!p[1]) break;
> > - new_hash += p[1] * 33 * 33;
> > - if (!p[2]) break;
> > - new_hash += p[2] * 33;
> > - if (!p[3]) break;
> > - hash = new_hash + p[3];
> > - }
> > - for (; *p; p++)
> > - hash = hash * 33 + *p;
> > + *klen = strlen(char_key);
> >
> > - *klen = p - key;
> > +#if SVN_UNALIGNED_ACCESS_IS_OK
> > + for (p = key, i = *klen; i >= 4; i-=4, p+=4)
> > + {
> > + hash = hash * 33 * 33 * 33 * 33
> > + + *(apr_uint32_t *)p;
>
> The distribution of this hash is far less than the original hash. The
> higher bytes in *p don't reach the lower hash value.
>
> This might help in the hash calculation performance, but it could
> seriously reduce the working of the hashtable using the keys.
>
> Bert
>
> > }
> > - else
> > +#else
> > + for (p = key, i = *klen; i >= 4; i-=4, p+=4)
> > {
> > - for (p = key, i = *klen; i >= 4; i-=4, p+=4)
> > - {
> > - hash = hash * 33 * 33 * 33 * 33
> > - + p[0] * 33 * 33 * 33
> > - + p[1] * 33 * 33
> > - + p[2] * 33
> > - + p[3];
> > - }
> > - for (; i; i--, p++)
> > - hash = hash * 33 + *p;
> > + hash = hash * 33 * 33 * 33 * 33
> > + + p[0] * 33 * 33 * 33
> > + + p[1] * 33 * 33
> > + + p[2] * 33
> > + + p[3];
> > }
> > +#endif
> > + for (; i; i--, p++)
> > + hash = hash * 33 + *p;
> >
> > return hash;
> > }
> >
>
>
>
--
*Join one of our free daily demo sessions on* *Scaling Subversion for the
Enterprise <http://www.wandisco.com/training/webinars>*
*
*
Received on 2013-05-16 19:58:09 CEST