[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: dangerous implementation of rep-sharing cache for fsfs

From: Mark Mielke <mark_at_mark.mielke.cc>
Date: Fri, 25 Jun 2010 10:15:40 -0400

The rep sharing collisions, even if possible due to collisions around
the world (has this happened in practice?), still have no effect on the
single repository which does not experience the collision.

There are many widely used systems that rely on statistical
improbability. Disk drive hardware failure is one example and I don't
understand why Michael is discounting this? Another huge example is the
use of UUID. Take an SCM system like ClearCase and understand that every
object created gets a UUID. UUID is effectively generated in isolation
according to some formula (location + time or random), and has a chance
of collision as well (even if time-based, it presumes that machine time
always moves forwards).

Why should Subversion solve a theoretical problem that doesn't seem to
exist in the real world?

I agree with Hyrum. If you don't like it, turn it off. I don't see the
problem, and I would prefer the developers work on *real*
*demonstratable* problems, like merge conflicts involving file renames.

Michael: Feel free to show a *real* repository where rep-sharing cache
has caused a corruption due to use of SHA-1.


On 06/25/2010 09:37 AM, Hyrum K. Wright wrote:
> On Fri, Jun 25, 2010 at 1:45 PM,<michael.felke_at_evonik.com> wrote:
>> Hello,
>> I am actually more interested in finding reliable solution
>> instead of discussing mathematics and probabilities.
> You can just disable rep-sharing. You won't have the space savings,
> but you'll feel better inside, knowing that the near-zero probability
> of hash collisions is now nearer to zero. I'm sorry that you can't
> have your cake and eat it too.
> Subversion 1.6.x has been released for over 16 months, and is in use
> by *millions* of users. We've yet to have a single complaint about
> hash collisions. While you may argue that this anecdotal evidence is
> not a proof of correctness, I would claim that in this case, it is a
> pretty good indicator.
>> ...
>> So there are 256^1024 = 1,09*10^2466 different data sequences
>> of 1K size.
>> This means for every hash value there are
>> (256^1024)/(2^128)
>> = (2^(8*1024))/(2^128)
>> = (2^(8192))/(2^128)
>> = 2^(8192-128)
>> = 2^8064
>> = 3,21*10^2427 sequences of Data of 1K size
>> represented by the same hash value.
> When you find a disk which will hold even a significant fraction of
> these 3.21 * 10 ^ 2427 1K sequences, let's talk. :)
> -Hyrum

Mark Mielke<mark_at_mielke.cc>
Received on 2010-06-25 16:16:24 CEST

This is an archived mail posted to the Subversion Dev mailing list.