[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Files with identical SHA1 breaks the repo

From: Stefan Sperling <stsp_at_elego.de>
Date: Sun, 26 Feb 2017 20:06:04 +0100

On Sun, Feb 26, 2017 at 07:29:30PM +0100, Branko Čibej wrote:
> On 26.02.2017 18:26, Paul Hammant wrote:
> > Why don't y'all take the same tactic as Git does - SHA1 the contents of the
> > file *and a prepended a type/length field* ?.
> And when the hash-colliding files happen to have the same type and
> length, as in the published collision...
> Ah, of course, Git is immune to that because it uses magic and pixie
> dust as well.

As far as I understand, Google's SHA1 collision relies on the specific
320 byte prefix which is shared by both PDF files being fed to SHA1
before any other data.

Git calculates a hash over 'blob LEN content-PDF-1' and 'blob LEN
content-PDF-2'. It is the identical 'blob LEN' parts which prevent
a collision of hashes of resulting git blob objects since they are
prefixed to the common 320 byte prefix.

If another collision were found which triggers when content of two
files is prefixed with 'blob LEN' then git would have a problem.
> The bottom line is that any data storage system that uses lossy
> content-based indexing is vulnerable to hash collisions. And both
> Subversion and Git developers were well aware of that when the
> vulnerable features were designed. For normal, day-to-day usage, SHA-1
> collisions are no more likely now than they were a week ago.

Right. The problem we have is that none of us ever never bothered to
instrument SVN's code to simulate a hash collision and test what
will happen. Of course we would expect only one of the contents to be
stored. But the system should not break in the way it does today.
Received on 2017-02-26 20:06:26 CET

This is an archived mail posted to the Subversion Dev mailing list.