[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Why do we check the base checksum so often?

From: Hyrum K Wright <hyrum.wright_at_wandisco.com>
Date: Mon, 6 Feb 2012 14:21:34 -0600

On Mon, Feb 6, 2012 at 1:26 PM, Greg Stein <gstein_at_gmail.com> wrote:
> On Mon, Feb 6, 2012 at 12:49, Hyrum K Wright <hyrum.wright_at_wandisco.com> wrote:
>>...
>>> Yeah, sounds like we're in a tough spot here.  The checksums in Ev1 exist
>>> only as sanity checks -- they definitely are NOT the primary selection
>>> mechanism for the editor implementation's base text.
>
> Right. This is a key point, and Hyrum's earlier emails adopted a tone
> of using the base checksum as a key. The real semantic behind that
> checksum is, "I'm going to send you a delta against some text base
> that we've negotiated or assumed through an out-of-band communication,
> and to verify that our communication is correct, that text base should
> have CHECKSUM."
>
> Ev2 gets rid of all that hand-wave magic and leaves deltafication to
> other layers (notably, RA layers sending stuff over the wire; ra_local
> will not need to deltify at all).
>
>>> I assume we still have a valid checksum to pass to close_file() (the
>>> checksum of the post-edit fulltext for that file), right?  It's just the
>>> pre-edit base checksum that's the problem?
>>
>> Correct.
>>
>> r1241097 relaxes the checks by special-casing the checksum of the
>> empty stream (as discussed elsethread).  This addresses the immediate
>> issue, and I think the generalized case can be punted toward the
>> individual ra-layers long-term.
>
> Note that this will break third-party Ev1 receivers (since they assume
> something other than the empty stream). You happened to fix *ours*,
> but not theirs.

Receivers (such as ours) are making unreasonable assumptions, in my
opinion. The docs for apply_textdelta() read thusly:

   * @a base_checksum is the hex MD5 digest for the base text against
   * which the delta is being applied; it is ignored if NULL, and may
   * be ignored even if not NULL. If it is not ignored, it must match
   * the checksum of the base text against which svndiff data is being
   * applied; if it does not, @c apply_textdelta or the @a *handler call
   * which detects the mismatch will return the error
   * SVN_ERR_CHECKSUM_MISMATCH (if there is no base text, there may
   * still be an error if @a base_checksum is neither NULL nor the hex
   * MD5 checksum of the empty string).

All the docs guarantee is that this checksum must match the base
checksum against which the delta is being applied. They don't assume
that the receiver has the content, nor that it will even pay attention
to the checksums themselves. If the receiver has done some kind of
backdoor secret handshake that's outside the scope of the delta
editor, and it should (reasonably) not care.

I maintain that it is still legal (though perhaps not very useful) for
the base checksum supplied by apply_textdelta() to be whatever
apply_textdelta() wants it to be.

> I think this is the wrong fix, for compatibility purposes. The
> Ev2->Ev1 shim should have a callback to say "please get me the
> context/checksum to generate a delta against."
>
> If the Ev2->Ev1 shim is ever auto-inserted for backwards compat
> reasons, then this is breakage. On that basis, I've gotta -1 shipping
> and using the shim as-is. Using it is a long ways out, so this hack is
> fine for now and for development. So you've got a long while before it
> needs to get fixed :-)

Sure, but I'm concerned that if this doesn't get fixed while we're
thinking about it now, it will languish for a while, and the bite
somebody when they come back to address it months/years from now.

-Hyrum

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/
Received on 2012-02-06 21:22:07 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.