[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Using svn_hash__make instead of apr_hash__make

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Sun, 20 May 2012 13:37:12 +0200

Greg Stein wrote:
> I thought the whole reason for variant results was to avoid O(N^2)
> attacks against the hash table. You would be defeating that work.

No. They (APR) did not solved the problem.
For instance, if all keys have the same length,
the new seed approach will simply rotate
the buckets.

APR basically went down the obscure-and-pray-
really-hard road. So there is nothing to defeat
right now. An actual fix would e.g.

* use a hash implementation with O(1) space
   and runtime guarantees, or
* use a bucket implementation that switches
   from a linear list to a tree above 4 entries

The latter is the solution with the least impact
but APR does not have b-trees and such that
it could simply use here.

> I see no reason to use a stable hash. If outputs are supposed to
> stable, then the *presentation * layer should be stabilizing. Not our
> core data structure.

Directory deltification making wordpress.org
go from 400+GB to 10GB *is* a reason.
Without stable hashes, we would need special
code for hash deltification.

I do agree that most parts of Subversion don't
*need* stable hashes. But having them makes
our life much easier by e.g. repeatable test runs.

> This is the same kind of silliness that stsp made to our core data
> structures. I still think that should be reverted in favor of
> presentation changes.

If Canadian "internet" had supported sending
e-mail (not just receiving), I had posted this more
than a week ago and spared stsp this discussion.

Again, these are my reasons for using svn_hash__make:

* consistent behavior of SVN across different APR versions
* give devs time to check all the 500+ places that create
   hashes throughout SVN for implicit assumptions on
   ordering and such.
* performance improvement; particularly with directory-
   or property-related operations

None of these points have been invalidated by
your "let's do it in post" suggestion.

> Cheers,
> -g

-- Stefan^2.

> On May 19, 2012 8:46 PM, "Stefan Fuhrmann"
> <stefanfuhrmann_at_alice-dsl.de <mailto:stefanfuhrmann_at_alice-dsl.de>> wrote:
> Hi all,
> svn_hash__make is basically apr_hash__make with
> a hash function that will produce the same results
> (element order etc) with APR 1.4.6 as well as earlier
> APR versions. Without it, different test runs will result
> in different results. As a bonus, we save some runtime.
> I have a patch sitting on my machine that will do a
> global replace of all 500+ locations where we create
> APR hashes. Often, we need to add another #include.
> There is no further change but the patch is large.
> If there are no strong objections, I will commit the
> change next weekend.
> -- Stefan^2.
Received on 2012-05-20 15:38:11 CEST

This is an archived mail posted to the Subversion Dev mailing list.