Johan Corveleyn wrote on Thu, May 11, 2017 at 01:34:18 +0200:
> On Wed, May 10, 2017 at 1:44 PM, Bert Huijben <bert_at_qqmail.nl> wrote:
> >> -----Original Message-----
> >> From: Stefan Sperling [mailto:stsp_at_apache.org]
> >> Sent: woensdag 10 mei 2017 13:34
> >> To: Daniel Shahaf <d.s_at_daniel.shahaf.name>
> >> Cc: dev_at_subversion.apache.org; commits_at_subversion.apache.org
> >> Subject: Re: svn commit: r1794632 - /subversion/trunk/notes/sha1-
> >> advisory.txt
> >>
> >> On Wed, May 10, 2017 at 09:11:50AM +0000, Daniel Shahaf wrote:
> >> > > Summary:
> >> > > ========
> >> > >
> >> > > Subversion repositories can be corrupted by committing two files
> >> > > which have different content, yet produce the same SHA1 checksum.
> >> >
> >> > I don't think we should call this "corruption": the on-disk data
> >> > structures are intact, both syntactically and semantically.
>
> Except the committed collision files, I guess. IIUC only one of both
> contents is stored, right? But okay, apart from those sha1-colliding
> contents themselves, the rest is in principle intact (but rendered
> partially inaccessible).
Right: only one of the two files is stored. (After 'svn import
shattered[12].pdf', there is only one ENDREP in the revision file.)
So, strictly speaking, this is a data loss for users who deliberately
commit sha1 collisions into their repositories. We've already
determined that there are two such use-cases: the webkit use-case — who
can just re-download shattered1.pdf — and researchers, who ought to have
disabled rep-sharing in the first place. That is: it's not a very
severe issue for the average user.
> >> > The problem
> >> > is in the libraries' assumption that sha1 has no collisions.
> >> >
> >> > I'm afraid I don't have a good suggestion; perhaps "Distinct files that
> >> > have equal sha1 checksums cannot be checked out"?
> >>
> >> I think we should call it corruption simply because it looks like
> >> that to our users when it happens (see webkit).
> >>
> >> This is a user-facing text. We want users to take action and upgrade so
> >> they won't run into the problem. The purpose of this text is to raise
> >> awareness. It is not to communicate technical details of the problem,
> >> which can be obtained by other means (reading code, mailing lists, etc.)
> >>
> >> I expect "corruption" will turn on people's alarm bells more than your
> >> suggested wording which is very exact but also sounds less dramatic.
> >
> > Those alarm bells are the reason why I wouldn't call it corruption, as that
> > part will probably be highlighted in the media, while there is nothing
> > corrupt on disk.
>
> Hm, I think Bert has a point. We should find the middle ground between
> making it dramatic enough (to attract attention and strong motivation
> to upgrade), but not causing panic and making this the only quote that
> will appear in the media.
>
> Maybe something like this?
>
> "Subversion repositories can be broken, becoming partly inaccessible,
> by committing two files which have different content, yet produce the
> same SHA1 checksum. There is no data loss, but parts of the repository
> can no longer be checked out or committed into."
Well, there _is_ data loss, so:
Subversion fails to store a file that has the same sha1 as another
file in the repository. Attempts to retrieve the first file would
fail with a checksum error (from the md5 checksum that we also use),
however, if the two files had not only equal sha1's but also equal md5's,
then the wrong content would silently be returned.
Plus a blurb about how that's not going to ever happen by accident.
> Separately: maybe we should include a reference to the pre-commit hook
> script, for completeness of the possible ways to protect oneself. I
> think it will be clear that upgrading to 1.9.6 will be the better fix,
> but that the pre-commit hook might buy you some time if you can't
> upgrade immediately.
+1
Received on 2017-05-11 07:21:01 CEST