[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Files with identical SHA1 breaks the repo

From: Garance A Drosehn <drosih_at_rpi.edu>
Date: Sun, 26 Feb 2017 11:08:21 -0500

On 24 Feb 2017, at 15:46, Stefan Sperling wrote:
>
> I believe we should prepare a new working format for 1.10.0
> which addresses this problem. I don't see a good way of fixing
> it without a format bump. The bright side of this is that it
> gives us a good reason to get 1.10.0 ready ASAP.
>
> We can switch to a better hash algorithm with a WC format
> bump.

One of the previous messages mentioned that better hash
algorithms are more expensive. So let me mention a tactic
that I used many years ago, when MD5 was the best digest
algorithm that I knew of, and I didn't trust it for the
larger files I was working with at the time:

Instead of going with a completely different hash algorithm,
just double-down on the one you're using. What I did was to
calculate one digest the standard way, and then a second one
which summed up every-other-byte (or every 3rd byte, or ...).
So to get a collision, not only do two files have to get the
same digest-result for all their data, but they have to also
get the same digest-result when exactly half the data is
skipped over.

(I did this a long time ago, and forget the details. What
I may have done for performance reasons was every-other-word,
not every-other-byte)

My thinking was that *any* single algorithm which processes
all the data is going to get collisions, eventually. But it
will be much harder for someone to generate a duplicate file
where there will also be a collision when summing up only
half of the data.

I'm not claiming this is great cure-all solution, but just
that it's an alternate tactic which might be interesting.
People could create repositories with just the one digest,
or upgrade it to use multiple digests if they have the need.

I found a few benchmarks which suggest that sha-256 is maybe
twice as expensive as sha-1, so calculating two sha-1 digests
might be a reasonable alternative.

-- 
Garance Alistair Drosehn                =     drosih_at_rpi.edu
Senior Systems Programmer               or   gad_at_FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA
Received on 2017-02-26 17:08:47 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.