[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Files with identical SHA1 breaks the repo

From: Stefan <luke1410_at_posteo.de>
Date: Sun, 26 Feb 2017 23:22:09 +0100

On 2/25/2017 08:51, bert_at_qqmail.nl wrote:
> I remember some experiments in early development of WC-NG where we
> measured which checksums worked vs which ones were too expensive.
> Going to the SHA1 family was at least 5 times more expensive or so…
> We determined back then SHA1 was good enough for our use and that of
> our users ‘*except for those doing collision research*’.
> Just adding more checksums internally, because we can won’t help our
> users… The only real solution is doing full comparisons when checksums
> match… Which virtually never happens. It happened for the first time
> now, so most likely never before for all of the Subversion users together.
> This is how we used MD5 before… But we determined SHA1 would be good
> enough to avoid this, even when such a collision would be found… as it
> is today.
Valid point, and I also think that this still stands for normal usage
now as it stood back when the decision was done (i.e. it's so unlikely
that you can provide two commits which result in a hash collision with
SHA-1, unless you are explicitly crafting such files).
> I don’t think this incident changes those original ideas about which
> hash is good enough… Perhaps some careful re-evaluation is necessary,
> but I don’t think we should just ‘fix this’ by bumping everything to
> the next hashtype.
> This ‘just use a more expensive hash’ may be a good approach for other
> users of hashes, but I don’t think we want to make every common
> Subversion operations much slower because there is one collision found
> using an insane amount of CPU/GPU power.
I actually agree here with you (while initially I had a different
opinion on the matter due to my (wrong?) assumption that the
performance impact comparing costs to calculate SHA-1 vs. SHA-2 was more
or less acceptable).

If using an SHA-2-based hash is in fact to be rejected due to it's
performance impact (on current hardware), then I however would be in
favor of making the usage of the actual hash-algorithm selectable (so we
leave it with the user on when the time is right to switch to an
alternative hash).

Yes, unarguably atm it's still way too costly to generate/calculate
other SHA-1 hash collisions as it seems. So to effectively cause havoc
to an SVN server, one would still invest quite an amount of resources at
the current time. But as time progresses, it undoubtedly will become
more and more realistic that SHA-1-based hash collisions can be
calculated/generated with adequate investments (be it by further
improving approaches to the underlying algorithms generating the hash
collisions or simply by computing power reaching the required level).

My personal opinion here is that given the timeframe I see SVN servers
are in production use nowadays (even up to 10 years), I think it'd be
reasonable to better have something ready now, then be sorry later.

> Of course we should fix things to not break, but that is a different
> story.
Absolutely right (and my reply in the previous section actually assumes
that the current issues would be solved).


Received on 2017-02-26 23:22:41 CET

This is an archived mail posted to the Subversion Dev mailing list.