Re: Why do we check the base checksum so often?

From: Greg Stein <gstein_at_gmail.com>
Date: Mon, 6 Feb 2012 15:41:06 -0500

On Mon, Feb 6, 2012 at 15:29, Hyrum K Wright <hyrum.wright_at_wandisco.com> wrote:
> On Mon, Feb 6, 2012 at 2:21 PM, Hyrum K Wright
> <hyrum.wright_at_wandisco.com> wrote:
>> On Mon, Feb 6, 2012 at 1:26 PM, Greg Stein <gstein_at_gmail.com> wrote:
>>> On Mon, Feb 6, 2012 at 12:49, Hyrum K Wright <hyrum.wright_at_wandisco.com> wrote:
>>>>...
>>>>> Yeah, sounds like we're in a tough spot here. The checksums in Ev1 exist
>>>>> only as sanity checks -- they definitely are NOT the primary selection
>>>>> mechanism for the editor implementation's base text.
>>>
>>> Right. This is a key point, and Hyrum's earlier emails adopted a tone
>>> of using the base checksum as a key. The real semantic behind that
>>> checksum is, "I'm going to send you a delta against some text base
>>> that we've negotiated or assumed through an out-of-band communication,
>>> and to verify that our communication is correct, that text base should
>>> have CHECKSUM."
>>>
>>> Ev2 gets rid of all that hand-wave magic and leaves deltafication to
>>> other layers (notably, RA layers sending stuff over the wire; ra_local
>>> will not need to deltify at all).
>>>
>>>>> I assume we still have a valid checksum to pass to close_file() (the
>>>>> checksum of the post-edit fulltext for that file), right? It's just the
>>>>> pre-edit base checksum that's the problem?
>>>>
>>>> Correct.
>>>>
>>>> r1241097 relaxes the checks by special-casing the checksum of the
>>>> empty stream (as discussed elsethread). This addresses the immediate
>>>> issue, and I think the generalized case can be punted toward the
>>>> individual ra-layers long-term.
>>>
>>> Note that this will break third-party Ev1 receivers (since they assume
>>> something other than the empty stream). You happened to fix *ours*,
>>> but not theirs.
>>
>> Receivers (such as ours) are making unreasonable assumptions, in my
>> opinion. The docs for apply_textdelta() read thusly:
>>
>> * @a base_checksum is the hex MD5 digest for the base text against
>> * which the delta is being applied; it is ignored if NULL, and may
>> * be ignored even if not NULL. If it is not ignored, it must match
>> * the checksum of the base text against which svndiff data is being
>> * applied; if it does not, @c apply_textdelta or the @a *handler call
>> * which detects the mismatch will return the error
>> * SVN_ERR_CHECKSUM_MISMATCH (if there is no base text, there may
>> * still be an error if @a base_checksum is neither NULL nor the hex
>> * MD5 checksum of the empty string).
>>
>> All the docs guarantee is that this checksum must match the base
>> checksum against which the delta is being applied. They don't assume
>> that the receiver has the content, nor that it will even pay attention
>> to the checksums themselves. If the receiver has done some kind of
>> backdoor secret handshake that's outside the scope of the delta
>> editor, and it should (reasonably) not care.
>>
>> I maintain that it is still legal (though perhaps not very useful) for
>> the base checksum supplied by apply_textdelta() to be whatever
>> apply_textdelta() wants it to be.
>>
>>> I think this is the wrong fix, for compatibility purposes. The
>>> Ev2->Ev1 shim should have a callback to say "please get me the
>>> context/checksum to generate a delta against."
>>>
>>> If the Ev2->Ev1 shim is ever auto-inserted for backwards compat
>>> reasons, then this is breakage. On that basis, I've gotta -1 shipping
>>> and using the shim as-is. Using it is a long ways out, so this hack is
>>> fine for now and for development. So you've got a long while before it
>>> needs to get fixed :-)
>>
>> Sure, but I'm concerned that if this doesn't get fixed while we're
>> thinking about it now, it will languish for a while, and the bite
>> somebody when they come back to address it months/years from now.
>
> And I'll add that maybe the right answer is to just revert r1241097
> and go back to passing NULL as the base checksum (which receivers are
> instructed to ignore anyway). We'd have to figure out how to address
> the test failure caused by this behavior, but that seems like it's the
> easier course of action at this point.

That is certainly an easier approach. "You use the old interface?
Fine. You won't get base checksum verification cuz we'll always pass
NULL." ... that seems like a fine position to take.

Cheers,
-g
Received on 2012-02-06 21:41:39 CET

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: Why do we check the base checksum so often?"
Previous message: Greg Stein: "Re: svn commit: r1241097 - in /subversion/trunk/subversion: libsvn_client/repos_diff.c libsvn_delta/compat.c libsvn_fs_base/tree.c libsvn_fs_fs/tree.c libsvn_wc/externals.c libsvn_wc/update_editor.c"
In reply to: Hyrum K Wright: "Re: Why do we check the base checksum so often?"
Next in thread: C. Michael Pilato: "Re: Why do we check the base checksum so often?"
Reply: C. Michael Pilato: "Re: Why do we check the base checksum so often?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]