[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: fs-rep-sharing branch

From: Greg Stein <gstein_at_gmail.com>
Date: Fri, 24 Oct 2008 16:31:27 -0400

Hunh?! I don't understand how you got 4 billion from 2^59.

And, personally, I'm not too worried for repositories with 4 billion
files, let alone 2^59 files.

Cheers,
-g

On Oct 23, 2008, at 18:29, "Daniel Berlin" <dberlin_at_dberlin.org> wrote:

> It's definitely not 2^128
>
> Assuming a perfectly even distribution (which md5 doesn't have), the
> birthday paradox means to expect a collision after 1.25 * sqrt(x)
> outputs.
>
> sqrt(2^128) = 2^64
>
> Probabilistically, you have a 1% chance of collision after ~2^59
> nodes, and it grows fairly quickly from there:
> 25% chance after 2^60, 50% chance after 2^61, etc
>
> Again, this is all with a perfectly even distribution. If MD5's
> distribution is say, "half as good as perfect", you will get
> collisions with a little more than 4 billion files.
>
> In any case, having a 64 bit number of files is getting within the
> reach of large systems.
> We should move to SHA1, which is in the "universe size number of
> files" range.
>
>
> On Tue, Oct 21, 2008 at 9:50 PM, Greg Stein <gstein_at_gmail.com> wrote:
>> There is a HUGE difference between constructing two files with the
>> same md5 in order to falsify a signature, and that of two files in a
>> repository having the same md5 hash by accident.
>>
>> Sit down and look at the odds. 1 in 2^128. If I understand my powers
>> of two properly, I believe that means the earth is more likely to
>> spontaneously explode, than for two files to have the same hash key.
>>
>> Cheers,
>> -g
>>
>> On Tue, Oct 21, 2008 at 3:57 PM, David Glasser <glasser_at_davidglasser.net
>> > wrote:
>>> As far as I can tell from reading the source, this (at least in
>>> FSFS)
>>> assumes that reps sharing the same md5 are the same file. (BDB
>>> seems
>>> to use sha1.)
>>>
>>> This means that you cannot store two files with the same md5 in the
>>> same repository. While obviously all hashes have collisions in
>>> theory, md5 has collisions in practice: there are known instances.
>>> And you know, cryptography researchers use Subversion! (At one
>>> point
>>> I tried to help fix Ron Rivest's corrupted svn repo...) I do not
>>> think that this limitation is appropriate for Subversion; I would
>>> highly advise against releasing this without changing FSFS to use
>>> SHA
>>> as well. (I can't find a mailing-list discussion of this choice; my
>>> apologies if I missed one, I have admittedly been not paying as much
>>> attention as I'd like to Subversion development recently.)
>>>
>>> --dave
>>>
>>> On Mon, Oct 6, 2008 at 8:59 PM, Hyrum K. Wright
>>> <hyrum_wright_at_mail.utexas.edu> wrote:
>>>> The fs-rep-sharing branch is functionally complete, and I'd like
>>>> to get the
>>>> branch merged to trunk soon. These are the stats for various
>>>> copies of of our
>>>> repository for the different branch/backend combinations.
>>>>
>>>> BDB: 1.5: 1.4GB
>>>> trunk: 627MB
>>>> reps-shared: 490MB
>>>>
>>>> FSFS: 1.5: 586MB
>>>> trunk: 578MB
>>>> reps-shared: 523MB
>>>>
>>>> The effect is quite pronounced on BDB, with around a 20% space
>>>> savings compared
>>>> with our current trunk (and over 67% compared with 1.5!) FSFS
>>>> doesn't show as
>>>> much improvement, partly due to the size of the index required to
>>>> enable
>>>> rep-sharing, partly due to decreased sharing opportunities in
>>>> same-revision and
>>>> parallel revision objects, and mostly due to the absolute floor
>>>> on repo size due
>>>> to inode usage.
>>>>
>>>> We may be able to tune the FSFS implementation just a bit. For
>>>> instance, it may
>>>> not be likely that directory content representations are likely
>>>> to be shared, in
>>>> which case we shouldn't bother
>>>>
>>>> The remaining issue is the failing blame tests. Blame tests 10
>>>> and 11, which
>>>> test 'blame -g', both fail for both backends. Before the recent
>>>> commits to add
>>>> rep-sharing to fsfs, the tests only failed for bdb. I'm slightly
>>>> puzzled here
>>>> because 'blame -g' should be FS-agnostic. If anybody has some
>>>> insight, I
>>>> welcome it.
>>>>
>>>> [Note: Because SQLite is still not an official dependency, to
>>>> compile the
>>>> rep-sharing stuff with FSFS, you'll need to add -
>>>> DENABLE_SQLITE_TESTING to the
>>>> CPPFLAGS when configuring.]
>>>>
>>>> -Hyrum
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
>>>
>>> ---
>>> ------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
>>> For additional commands, e-mail: dev-help_at_subversion.tigris.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
>> For additional commands, e-mail: dev-help_at_subversion.tigris.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-24 22:32:18 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.