[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Why do we check the base checksum so often?

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Sat, 4 Feb 2012 16:59:41 +0000 (GMT)

Hyrum K Wright wrote:

> Julian Foad wrote:
>> Hyrum K Wright wrote:
>>> The Ev2 shims get in the way of how text deltas are transmitted, by
>>> reconstituting the full text, and then just streaming that to the
>>> receiver via svn_txdelta_send_stream().  I've got a patch which
>>> actually starts reporting the base checksum---which with the shims
>>> will always be the "empty" checksum---and it turns out that
>>> such a patch breaks the World.
>>>
>>> The reason for this breakage is that there are several places in both
>>> the FS and the WC that we check the delta editor's reported base
>>> checksum against some other value we have on hand which we *think*
>>> should be the base.  Until now, these checks have always passed, since
>>> there was an implicit understanding about what the delta editor would
>>> use as its base.
>>>
>>> However, I think that these checks are wrong.  They rely upon an
>>> implementation detail ("is the delta editor sending a text delta
>>> against the base we think it ought to?") rather than the result ("did
>>> we end up with the content we expected to end up with?")
>>
>> When we (the WC update code for example) receive a text delta, we apply it
>> to a text base that we already have, in order to create a new text.  We
>> need to be applying it against the correct base [...]
>
> I understand this principle, but I don't think that's what the API
> is/should be doing.  The apply_textdelta callback is essentially
> saying "apply this delta against the base with this checksum".  In the
> current regime, we know a priori what that base "should" be, so we
> make sure that apply_textdelta spits that information back to us.
>
> But I don't think that's always a valid assumption.  If the delta
> editor chose some other base to use (in this case, the empty stream),
> and indicated that through the apply_textdelta() base checksum
> parameter, a receiver should be happy to accomodate that request.
> "Why should I use the base you told me to use, when I can use this one
> more efficiently?"

We're talking here about the delta editor (Ev1).  The driver shouldn't have free rein to choose any base, because the receiver does not have all possible bases at hand ready to apply the delta onto.  At least in the server-to-client direction (update etc.) the client probably only has one suitable base text per possible file.  Either the server would have to be told what base texts it could choose from, or the client would potentially not be able to apply the delta until it first asks the server to send it the relevant base text, which would pretty much negate the point of having deltified in the first place.  In the other direction, of course, we can now start to design protocols where the client picks any base text that it knows exists in the repository, and the server could be able to access it, now we have the rep-cache and the idea of looking up texts by their checksum.  But ... that can't be what you're thinking of, I'm sure.

The empty stream is a special case.  It's valid suggestion to say the driver should have the option of sending a full text, or a delta against an empty stream which is semantically the same thing.  But retro-fitting that onto Ev1 isn't interesting at this point.

Now, if we talk about Ev2 (I know you're actually looking at the shims between the two), then we've explicitly designed that the mechanism for transferring texts is outside the scope of the editor iteself and so the driver and receiver code are responsible (assisted by respective layers above them) for co-ordinating in any way they want to.  The Ev2 solution for deltifying text between driver and receiver could include (warning: possible hair-brained ideas): the receiver telling the driver what base texts it has available; the driver first choosing a base that's convenient for it, and letting the receiver request that base from the driver (out of band) if the receiver doesn't have it available; and so on.

I'm not quite sure I fully follow you at the moment, so I'm not sure if my reply is on the right track at all, but it's really sounding like you're up against a mis-match of responsibilities between Ev1 which sends deltas according to particular rules and Ev2 which is designed to be wrapped inside a driver-receiver pairing that knows privately how to deltify and recover to full text in any way it wants to.  The shims obviously need to convert from the Ev2 deltification back (via a full text intermediary if necessary) to what Ev1 expects.

- Julian
Received on 2012-02-04 18:00:21 CET

This is an archived mail posted to the Subversion Dev mailing list.