[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Files with identical SHA1 breaks the repo

From: Stefan Sperling <stsp_at_elego.de>
Date: Wed, 1 Mar 2017 11:01:40 +0100

On Tue, Feb 28, 2017 at 10:17:34PM -0600, Greg Stein wrote:
> I really like this idea.
>
> And we could take a copy of APR's sha1 code, and rejigger it to perform
> *both* hashes during the same scan of the raw bytes. I would expect the
> time taken to extend by (say) 1.1X rather than a full 2X. The inner loop
> might cost a bit more, but we'd only scan the bytes once. Very handy, when
> you're talking about megabytes in a stream-y environment.
>
> (and medium-term, push this dual-sha1 computation back into APR)

The idea is nice and I would support its application.
Note however that it does not help with fixing current releases.
We would need to store this second hash somewhere, which implies a format
and/or protocol change, depending on where the idea is applied (rep-cache,
ra-serf, pristine store, ...)

For now, we should focus on solutions that can be backported because that's
what our users need most. Our current formats and protocols will only
store/send MD5 and SHA1 of the full content so for 1.8 and 1.9 we will
have to find something that works within these restrictions.
One option would be to disable affected features. But some features can't
just be disabled, such as the pristine store.

In theory the existing system should work as it is as long as only one side
of the collision is allowed to survive. We will need format changes only to
allow storing both PDFs. We could delay 1.10 a bit to gain time for working
out long-term solutions which imply format changes.
Received on 2017-03-01 11:02:00 CET

This is an archived mail posted to the Subversion Dev mailing list.