On Tue, Sep 15, 2009 at 19:57, Neels J Hofmeyr <neels_at_elego.de> wrote:
> Greg Stein wrote:
>> Sorry for the short reply; on phone; more later.
>> Ev2 doesn't care about text deltas. The stream passed in set_text is the
>> new, desired contents. If the receiver wants to construct a delta, then
>> it needs to discover a base somehow, and use that. In the RA commit
>> editor case, it wants a delta for the wire, so it would take a WC
>> callback at editor-construction time to get a stream of the base contents.
> what? Isn't it just a revision number per file? The repos already has BASE
> right there.
>> The update and merge editors in the WC don't need deltas; they just use
>> the new contents.
> uh, what? -1.
> We've got deltas lying around everywhere, don't see how 'update' should send
> entire source files: We've got BASE and HEAD, should be a straightforward
> delta to apply to WORKING.
> Could you drive your point home for me? I don't buy it.
The delta streams are *only* used over the wire.
Ev2 is designed to pass the fulltext through the API. The receiver
takes the fulltext and can store it, compute a delta, or even drop it
on the floor. In the Ev2 world, I envision the following four
scenarios around delta computation:
1) RA *sends* a delta to REPOS: the RA layer takes ACTUAL (as
fulltext), computes a delta against BASE (fetched via WC callback),
and sends that to REPOS.
2) RA *receives* a delta from REPOS: the RA layer fetches BASE from
the WC, applies the delta, and passes the new fulltext to set_text()
3) REPOS *sends* a delta to RA: REPOS gets BASE and TARGET from the
FS, computes the delta, and sends it over the wire
4) REPOS *receives* a delta from RA: REPOS gets BASE from the FS,
applies the delta, and passes that to svn_fs_apply_text().
I'll note that REPOS the server side is a bit looser with editors than
the client. Conceptually, you could imagine Ev2 implementing
set_text() as svn_fs_apply_text() when the FS is the receiver. When
the FS is the driver, it would send TARGET into set_text(), and let
REPOS fetch BASE *if needed*.
That last "if needed" is the important point here. It is REPOS that
determines whether a delta is needed. NOT FS. In Ev1, the driver (the
FS) computed a delta and shoved it into the API. But if the client
does not have a BASE, then REPOS would have to (re)generate the
fulltext and send that down the wire. The delta computation was
Now, consider libsvn_ra_local. No delta computation ever has to occur.
RA just writes the full text, which writes it into the FS. This stuff
is all linked together, so no need for computing/sending/applying
deltas exist. Just pass it through in-memory streams from the source
into the FS.
Consider 'svn import'. The RA networking layers can do a simple zlib
compression on the wire, sending the fulltext up to the server. It
*could* do a delta computation against the empty file for a bit of
self-compression, but the space savings are comparable, so it is
actually nicer to just go with standard compression (which helps with
intermediary servers, proxies, etc; they may try to compress one of
our delta-compressed files because they think none has occurred; so
those intermediaries would be doing senseless work; but if the request
was tagged as 'deflate', then all is good -- no extra compression and
tools can look at the fulltext if desired).
Consider 'svn merge': as changes are being applied to the working
copy, why should a delta ever happen? The working copy wants fulltext.
Sure, 'svn patch' will be applying a patch, but there isn't really any
delta needs anywhere in merge/patch. If the RA needs to request a file
source, then it can certainly provide a BASE, get a delta, and
generate a fulltext; but I'll note this RA behavior is operating
outside of the Editor API anyways.
So. For all these various reasons/scenarios, Ev2 operates with
fulltext. If the *receiver* determines that a delta is needed, then it
can do so. But that is outside the scope of the Ev2 system.
Note that the receiver does need some kind of callback, arranged at
editor construction time, in order to fetch BASE fulltexts so that it
can produce a delta.
>> Also note that the content stream should typically lazy-load. The
>> receiver may close it immediately because it has the contents already,
>> as identified by the checksum.
> Sorry, don't understand what you mean. lazy-load vs. close immediately?
As a simple example, let's say that you are doing a commit. You have
manually copied a versioned file (rather than 'svn copy'). At commit
time, set_text() is called specifying this new file's checksum. The RA
layer knows the server already has a file with that checksum (through
some hand-wave magic irrelevant to the point here), and so the RA
layer simply *closes* the file contents stream. It doesn't want to
send the contents to the server, but just the checksum instead.
In this scenario, you can optimize the client I/O by *waiting* to open
the underlying fulltext file until the first read comes in. So for the
case where a read never occurs (just a close), then the filesystem is
>> On completion: if you don't complete something right away, then you need
>> an additional signal from the driver that the node is (now) complete.
>> You may also need to keep state on the incomplete node, and with no
>> particular ordering, you must hold all of those states for an
>> indeterminate time.
> well, until the editor completes or aborts, at most. Why forbid this, given
> an implementation does want to do that? (like, as we once hypothesized, a
> conversion layer from editor v2 to v1 would have to?)
If an implementation wants to do that, then fine. But omitting the
completeness aspect means that all receivers *must* retain some notion
of incompleteness for an indefinable time period.
It simplifies the receives, and creates a *much* better resumability
story, to specify all the children at add_directory() time.
Received on 2009-09-16 05:57:08 CEST