[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs-rep-sharing branch

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: Fri, 24 Oct 2008 20:31:33 -0400

So just to be clear, you've gone from "this will never happen" to "I
don't care about these people"?

On Fri, Oct 24, 2008 at 4:31 PM, Greg Stein <gstein_at_gmail.com> wrote:
> Hunh?! I don't understand how you got 4 billion from 2^59.
>
> And, personally, I'm not too worried for repositories with 4 billion files,
> let alone 2^59 files.
>
> Cheers,
> -g
>
>
>
> On Oct 23, 2008, at 18:29, "Daniel Berlin" <dberlin_at_dberlin.org> wrote:
>
>> It's definitely not 2^128
>>
>> Assuming a perfectly even distribution (which md5 doesn't have), the
>> birthday paradox means to expect a collision after 1.25 * sqrt(x)
>> outputs.
>>
>> sqrt(2^128) = 2^64
>>
>> Probabilistically, you have a 1% chance of collision after ~2^59
>> nodes, and it grows fairly quickly from there:
>> 25% chance after 2^60, 50% chance after 2^61, etc
>>
>> Again, this is all with a perfectly even distribution. If MD5's
>> distribution is say, "half as good as perfect", you will get
>> collisions with a little more than 4 billion files.
>>
>> In any case, having a 64 bit number of files is getting within the
>> reach of large systems.
>> We should move to SHA1, which is in the "universe size number of files"
>> range.
>>
>>
>> On Tue, Oct 21, 2008 at 9:50 PM, Greg Stein <gstein_at_gmail.com> wrote:
>>>
>>> There is a HUGE difference between constructing two files with the
>>> same md5 in order to falsify a signature, and that of two files in a
>>> repository having the same md5 hash by accident.
>>>
>>> Sit down and look at the odds. 1 in 2^128. If I understand my powers
>>> of two properly, I believe that means the earth is more likely to
>>> spontaneously explode, than for two files to have the same hash key.
>>>
>>> Cheers,
>>> -g
>>>
>>> On Tue, Oct 21, 2008 at 3:57 PM, David Glasser <glasser_at_davidglasser.net>
>>> wrote:
>>>>
>>>> As far as I can tell from reading the source, this (at least in FSFS)
>>>> assumes that reps sharing the same md5 are the same file. (BDB seems
>>>> to use sha1.)
>>>>
>>>> This means that you cannot store two files with the same md5 in the
>>>> same repository. While obviously all hashes have collisions in
>>>> theory, md5 has collisions in practice: there are known instances.
>>>> And you know, cryptography researchers use Subversion! (At one point
>>>> I tried to help fix Ron Rivest's corrupted svn repo...) I do not
>>>> think that this limitation is appropriate for Subversion; I would
>>>> highly advise against releasing this without changing FSFS to use SHA
>>>> as well. (I can't find a mailing-list discussion of this choice; my
>>>> apologies if I missed one, I have admittedly been not paying as much
>>>> attention as I'd like to Subversion development recently.)
>>>>
>>>> --dave
>>>>
>>>> On Mon, Oct 6, 2008 at 8:59 PM, Hyrum K. Wright
>>>> <hyrum_wright_at_mail.utexas.edu> wrote:
>>>>>
>>>>> The fs-rep-sharing branch is functionally complete, and I'd like to get
>>>>> the
>>>>> branch merged to trunk soon. These are the stats for various copies of
>>>>> of our
>>>>> repository for the different branch/backend combinations.
>>>>>
>>>>> BDB: 1.5: 1.4GB
>>>>> trunk: 627MB
>>>>> reps-shared: 490MB
>>>>>
>>>>> FSFS: 1.5: 586MB
>>>>> trunk: 578MB
>>>>> reps-shared: 523MB
>>>>>
>>>>> The effect is quite pronounced on BDB, with around a 20% space savings
>>>>> compared
>>>>> with our current trunk (and over 67% compared with 1.5!) FSFS doesn't
>>>>> show as
>>>>> much improvement, partly due to the size of the index required to
>>>>> enable
>>>>> rep-sharing, partly due to decreased sharing opportunities in
>>>>> same-revision and
>>>>> parallel revision objects, and mostly due to the absolute floor on repo
>>>>> size due
>>>>> to inode usage.
>>>>>
>>>>> We may be able to tune the FSFS implementation just a bit. For
>>>>> instance, it may
>>>>> not be likely that directory content representations are likely to be
>>>>> shared, in
>>>>> which case we shouldn't bother
>>>>>
>>>>> The remaining issue is the failing blame tests. Blame tests 10 and 11,
>>>>> which
>>>>> test 'blame -g', both fail for both backends. Before the recent
>>>>> commits to add
>>>>> rep-sharing to fsfs, the tests only failed for bdb. I'm slightly
>>>>> puzzled here
>>>>> because 'blame -g' should be FS-agnostic. If anybody has some insight,
>>>>> I
>>>>> welcome it.
>>>>>
>>>>> [Note: Because SQLite is still not an official dependency, to compile
>>>>> the
>>>>> rep-sharing stuff with FSFS, you'll need to add -DENABLE_SQLITE_TESTING
>>>>> to the
>>>>> CPPFLAGS when configuring.]
>>>>>
>>>>> -Hyrum
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
>>>> For additional commands, e-mail: dev-help_at_subversion.tigris.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
>>> For additional commands, e-mail: dev-help_at_subversion.tigris.org
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-25 02:32:04 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.