[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Re: dangerous implementation of rep-sharing cache for fsfs

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Fri, 25 Jun 2010 14:37:03 +0100

On Fri, Jun 25, 2010 at 1:45 PM, <michael.felke_at_evonik.com> wrote:
> Hello,
>
> I am actually more interested in finding reliable solution
> instead of discussing mathematics and probabilities.

You can just disable rep-sharing. You won't have the space savings,
but you'll feel better inside, knowing that the near-zero probability
of hash collisions is now nearer to zero. I'm sorry that you can't
have your cake and eat it too.

Subversion 1.6.x has been released for over 16 months, and is in use
by *millions* of users. We've yet to have a single complaint about
hash collisions. While you may argue that this anecdotal evidence is
not a proof of correctness, I would claim that in this case, it is a
pretty good indicator.

> ...

> So there are 256^1024 = 1,09*10^2466 different data sequences
> of 1K size.
> This means for every hash value there are
> (256^1024)/(2^128)
> = (2^(8*1024))/(2^128)
> = (2^(8192))/(2^128)
> = 2^(8192-128)
> = 2^8064
> = 3,21*10^2427 sequences of Data of 1K size
> represented by the same hash value.

When you find a disk which will hold even a significant fraction of
these 3.21 * 10 ^ 2427 1K sequences, let's talk. :)

-Hyrum
Received on 2010-06-25 15:37:46 CEST

This is an archived mail posted to the Subversion Dev mailing list.