I think there are two different points, here are my thoughts:
The first point is that both Repository and Client should work fine in case of collisions. Whatever algorithm we choose, we can never be sure that there won't be a collision.
That could be achieved e. G. by a mandatory full-text comparison if the hashes match, and just not using the cached rep if the hashes match, but the full text is not identical. (I'm not sure about the performance implications, however.)
Similarly, the pristine store could put an "ABCD_2" file next to the "ABCD" file if hashes match, but full text does not match.
The protocol must never rely on the hash as a unique identifier for the content - it should use it merely as a checksum to verify integrity.
As SVN is not a crypto product, neither integrity nor security should depend on the strength of a specific hash algorithm.
The second point is to reduce the likelihood of collisions, which is discussed in this (sub-)thread.
I tend to think that if SVN manages to work fine in case of collisions, then reducing the likelihood of collisions might not be worth that much effort. Collisions will still be a rare event, after all.
(And even for those security researchers which work with sets of colliding files, they will still have a working system, although it might not be fully optimized.)
As we already have SHA1 and MD5 algorithms established, we could just combine them both. That should deliver acceptable performance, while still keeping the collision probability low enough.
Or maybe it's just not worth any effort, once the first point is solved.
DOSing a repository by checking in many colliding files still requires write access, and there are more effective ways to cause damage once you've write access.
CODESYS® a trademark of 3S-Smart Software Solutions GmbH
Inspiring Automation Solutions
3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50
E-Mail: email@example.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com
Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
> From: Garance A Drosehn [mailto:drosih_at_rpi.edu]
> Sent: Thursday, March 02, 2017 7:32 PM
> To: Daniel Shahaf
> Cc: Subversion Development
> Subject: Re: Files with identical SHA1 breaks the repo
> On 2 Mar 2017, at 6:37, Daniel Shahaf wrote:
> > Garance A Drosehn wrote on Wed, Mar 01, 2017 at 14:48:07 -0500:
> >> I do not see how this extended-sha1 would be easier to break than
> >> current sha1, because it includes the current sha1, unchanged.
> > Regarding "easier to break", you were probably thinking of
> > attacks, which are never made easier by extending the output. I
> > thinking of "first preimage" attacks (finding a message that has a
> > given checksum). Extending the output can make _these_ attacks
> > easier.
> Yes, in the context of subversion I'm thinking of collision attacks,
> and only collision attacks.
> > That's called a "second preimage attack". The extended
> > is not secure against such an attack: a hash function with a 320-
> > output is supposed to require
> > O(2³²⁰) evaluations to find a second preimage, but it's trivial to
> > find a second preimage for F and G simultaneously with only O(2¹⁶⁰)
> > evaluations. (Do you see that?)
> But consider that sha1 is a 160-bit output. With the paper that
> Google just released, did they need to do O(2¹⁶⁰) evaluations to
> create a collision?
> When I looked at their announcement at
> I noticed the following comments:
> "In practice, collisions should never occur for secure
> hash functions. However if the hash algorithm has some
> flaws, as SHA-1 does, a well-funded attacker can craft
> a collision."
> As to how much work it was:
> "- 6,500 years of CPU computation to complete the
> attack first phase.
> - 110 years of GPU computation to complete the
> second phase."
> "While those numbers seem very large, the SHA-1
> shattered attack is still more than 100,000 times
> faster than a brute force attack, which remains
> My thinking is that their attack is not a brute-force attack, and
> thus assuming the ideal of O(2¹⁶⁰) is misleading.
> "In 2013, Marc Stevens published a paper that outlined
> a theoretical approach to create a SHA-1 collision"
> I'll admit that I have no idea how his theoretical insight would
> apply to a double-sha1 hash. But if all it does is get us back to
> O(2¹⁶⁰) (despite using 320-bits!), that leaves us 100,000 times
> better off than we currently are.
> >> [...]. But it might be
> >> faster to use this tactic instead of making the much more
> >> change of totally dropping sha1 for sha256.
> >> (I don't mean it will necessarily be faster to implement, but just
> >> that it might run faster in day-to-day operations)
> > I concede the performance issue.
> > Cheers,
> Indeed, performance is the only reason I suggested this. Some
> comments in this thread were concerned about the performance- hit of
> moving to sha256.
> I was also thinking it *might* be easier to make a double-sha1
> repository backwards-compatible with older versions of svn.
> But I do not know anything about the code in subversion, so that's
> total speculation on my part.
> Thanks for the extra info. The paper was interesting, even if some
> of it went over my head...
> Garance Alistair Drosehn = drosih_at_rpi.edu
> Senior Systems Programmer or gad_at_FreeBSD.org
> Rensselaer Polytechnic Institute; Troy, NY; USA
Received on 2017-03-03 08:18:31 CET