[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Files with identical SHA1 breaks the repo

From: <bert_at_qqmail.nl>
Date: Sat, 25 Feb 2017 08:51:14 +0100

I remember some experiments in early development of WC-NG where we measured which checksums worked vs which ones were too expensive. Going to the SHA1 family was at least 5 times more expensive or so…

We determined back then SHA1 was good enough for our use and that of our users ‘except for those doing collision research’.

Just adding more checksums internally, because we can won’t help our users… The only real solution is doing full comparisons when checksums match… Which virtually never happens. It happened for the first time now, so most likely never before for all of the Subversion users together.

This is how we used MD5 before… But we determined SHA1 would be good enough to avoid this, even when such a collision would be found… as it is today.

I don’t think this incident changes those original ideas about which hash is good enough… Perhaps some careful re-evaluation is necessary, but I don’t think we should just ‘fix this’ by bumping everything to the next hashtype.

This ‘just use a more expensive hash’ may be a good approach for other users of hashes, but I don’t think we want to make every common Subversion operations much slower because there is one collision found using an insane amount of CPU/GPU power.

Of course we should fix things to not break, but that is a different story.

     Bert

Sent from Mail for Windows 10

From: Stefan Sperling
Sent: vrijdag 24 februari 2017 17:10
To: Andreas Stieger
Cc: Subversion Development
Subject: Re: Files with identical SHA1 breaks the repo

On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote:
> Hi,
>
> "Stefan Hett" wrote:
> > On 2/23/2017 9:02 PM, Øyvind A. Holm wrote:
> > > This is the only known SHA-1 collision at the moment, but Google will
> > > release the collision code in 90 days, so we can expect this not to last
> > > forever.
> > Reading up on that in an article on a German magazine [1] clarifies that
> > the effort to create that hash still quite large (6500 CPU years + 100
> > GPU years to calculate the collision). So this relativates the impact a bit.
> > Certainly I'm not trying to say that the situation on SVN's side
> > should/could not be improved, though.
> >
> > [1]
> > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html
>
> An occurrence of this issue in a production repository with the published PDFs:
> https://bugs.webkit.org/show_bug.cgi?id=168774#c29
>
> Andreas

Well, what did they expect? Did they expect that all software which is
part of their toolchain has ever been tested with files that produce
a SHA1 collision? Nobody had such files until yesterday...
They should have tried this on a test repository first.

Anyway, so SVN has multiple problems with SHA1 collisions.

One problem is that the libsvn_wc code does the wrong thing when SHA1
hashes match but MD5 hashes do not. The error on checkout is happening
because pristines are keyed on SHA1, and only one pristine is saved:

$ ls .svn/pristine/
38/
$ ls .svn/pristine/38/
38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
$ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = 38762cf7f55934b34d179ae6a4c80cadccbb7f0a
$ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = ee4aa52b139d925f8d8884402b0a750c

By design, the current working copy format cannot store both of these PDFs.
This is hard to solve without a working copy format bump :-/
The best fix would probably be moving libsvn_wc to SHA256 or SHA3.

FSFS looks alright. The node records for these two PDFs look like this:

[[[
id: 0-1.0.r1/5
type: file
count: 0
text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-1.pdf
copyroot: 0 /

id: 2-1.0.r1/6
type: file
count: 0
text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-2.pdf
copyroot: 0 /
]]]

We should look into making the FSFS code make use of both checksums to
handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well.

'svnadmin load' fails, too:

$ svnadmin create repo2
$ vi repo
repo/ repo2/
$ vi repo2/db/fs
fs-type fsfs.conf
$ vi repo2/db/fsfs.conf # disable rep-sharing
$ svnadmin dump repo > repo.dump
* Dumped revision 0.
* Dumped revision 1.
$ svnadmin load repo2 < repo.dump
<<< Started new transaction, based on original revision 1
     * editing path : shattered-1.pdf ... done.
     * editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709,
subversion/libsvn_repos/load.c:351,
subversion/libsvn_subr/stream.c:273,
subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH)
svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf':
   expected: 5bd9d8cabc46041579a311230539b8d1
     actual: ee4aa52b139d925f8d8884402b0a750c

Again, the dump file looks OK. This problem occurs somewhere in the
commit processing path. No time to debug this ATM.
Received on 2017-02-25 08:51:31 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.