[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Files with identical SHA1 breaks the repo

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 24 Feb 2017 17:10:20 +0100

On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote:
> Hi,
>
> "Stefan Hett" wrote:
> > On 2/23/2017 9:02 PM, Řyvind A. Holm wrote:
> > > This is the only known SHA-1 collision at the moment, but Google will
> > > release the collision code in 90 days, so we can expect this not to last
> > > forever.
> > Reading up on that in an article on a German magazine [1] clarifies that
> > the effort to create that hash still quite large (6500 CPU years + 100
> > GPU years to calculate the collision). So this relativates the impact a bit.
> > Certainly I'm not trying to say that the situation on SVN's side
> > should/could not be improved, though.
> >
> > [1]
> > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html
>
> An occurrence of this issue in a production repository with the published PDFs:
> https://bugs.webkit.org/show_bug.cgi?id=168774#c29
>
> Andreas

Well, what did they expect? Did they expect that all software which is
part of their toolchain has ever been tested with files that produce
a SHA1 collision? Nobody had such files until yesterday...
They should have tried this on a test repository first.

Anyway, so SVN has multiple problems with SHA1 collisions.

One problem is that the libsvn_wc code does the wrong thing when SHA1
hashes match but MD5 hashes do not. The error on checkout is happening
because pristines are keyed on SHA1, and only one pristine is saved:

$ ls .svn/pristine/
38/
$ ls .svn/pristine/38/
38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
$ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = 38762cf7f55934b34d179ae6a4c80cadccbb7f0a
$ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = ee4aa52b139d925f8d8884402b0a750c

By design, the current working copy format cannot store both of these PDFs.
This is hard to solve without a working copy format bump :-/
The best fix would probably be moving libsvn_wc to SHA256 or SHA3.

FSFS looks alright. The node records for these two PDFs look like this:

[[[
id: 0-1.0.r1/5
type: file
count: 0
text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-1.pdf
copyroot: 0 /

id: 2-1.0.r1/6
type: file
count: 0
text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-2.pdf
copyroot: 0 /
]]]

We should look into making the FSFS code make use of both checksums to
handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well.

'svnadmin load' fails, too:

$ svnadmin create repo2
$ vi repo
repo/ repo2/
$ vi repo2/db/fs
fs-type fsfs.conf
$ vi repo2/db/fsfs.conf # disable rep-sharing
$ svnadmin dump repo > repo.dump
* Dumped revision 0.
* Dumped revision 1.
$ svnadmin load repo2 < repo.dump
<<< Started new transaction, based on original revision 1
     * editing path : shattered-1.pdf ... done.
     * editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709,
subversion/libsvn_repos/load.c:351,
subversion/libsvn_subr/stream.c:273,
subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH)
svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf':
   expected: 5bd9d8cabc46041579a311230539b8d1
     actual: ee4aa52b139d925f8d8884402b0a750c

Again, the dump file looks OK. This problem occurs somewhere in the
commit processing path. No time to debug this ATM.
Received on 2017-02-24 17:10:40 CET

This is an archived mail posted to the Subversion Dev mailing list.