[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: more editor v2 (3)

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Thu, 17 Sep 2009 01:07:36 +0200

Greg Stein wrote:
>> Could you drive your point home for me? I don't buy it.
>
> The delta streams are *only* used over the wire.

Ahhhh I seee!

> Ev2 is designed to pass the fulltext through the API. The receiver
> takes the fulltext and can store it, compute a delta, or even drop it
> on the floor. In the Ev2 world, I envision the following four
> scenarios around delta computation:
>
> 1) RA *sends* a delta to REPOS: the RA layer takes ACTUAL (as
> fulltext), computes a delta against BASE (fetched via WC callback),
> and sends that to REPOS.
cool

> 2) RA *receives* a delta from REPOS: the RA layer fetches BASE from
> the WC, applies the delta, and passes the new fulltext to set_text()
And RA told REPOS what the current BASE revnum is, right?
BTW, is BASE always at one single revision across the WC? What if I had a
few paths 'update -r123'd to a different revision?

> 3) REPOS *sends* a delta to RA: REPOS gets BASE and TARGET from the
> FS, computes the delta, and sends it over the wire
cool

> 4) REPOS *receives* a delta from RA: REPOS gets BASE from the FS,
> applies the delta, and passes that to svn_fs_apply_text().
cool

Now, I still have a problem with it, in the case of 'diff', and I think also
'merge':

$ svn diff ^/A -c123

What makes most sense to me:
- REPOS gets the two revs from FS and computes a delta.
- RA transports the delta back to client

ok, now what. Receiver gets a set_text() of delta applied to current BASE?
nah. The Ev2 wire sends r123 in full and client has to fetch another
fulltext r122 via RA to compute a delta client side?? Nah!
(Similarly, merge wants to know a delta between things that aren't
necessarily known on client side, s.b.)

So for diff, I'd actually lose Ev2 and construct a separate diff/merge tree
delta communication. Red alert!

> I'll note that REPOS the server side is a bit looser with editors than
> the client. Conceptually, you could imagine Ev2 implementing
> set_text() as svn_fs_apply_text() when the FS is the receiver. When
> the FS is the driver, it would send TARGET into set_text(), and let
> REPOS fetch BASE *if needed*.
>
> That last "if needed" is the important point here. It is REPOS that
> determines whether a delta is needed. NOT FS. In Ev1, the driver (the
> FS) computed a delta and shoved it into the API. But if the client
> does not have a BASE, then REPOS would have to (re)generate the
> fulltext and send that down the wire. The delta computation was
> totally useless.

I see how that makes sense. But fetching a delta from FS vs. fetching a
complete text from FS is not dependent on what's sent over the wire. I'd
humbly suggest not plugging the editor directly into FS... ?

> Now, consider libsvn_ra_local. No delta computation ever has to occur.
> RA just writes the full text, which writes it into the FS. This stuff
> is all linked together, so no need for computing/sending/applying
> deltas exist. Just pass it through in-memory streams from the source
> into the FS.

Mostly yes, but don't forget 'svn diff' and 'svn merge'. (s.b.)

>
> Consider 'svn import'. The RA networking layers can do a simple zlib
> compression on the wire, sending the fulltext up to the server. It
> *could* do a delta computation against the empty file for a bit of
> self-compression, but the space savings are comparable, so it is
> actually nicer to just go with standard compression (which helps with
> intermediary servers, proxies, etc; they may try to compress one of
> our delta-compressed files because they think none has occurred; so
> those intermediaries would be doing senseless work; but if the request
> was tagged as 'deflate', then all is good -- no extra compression and
> tools can look at the fulltext if desired).
>
> Consider 'svn merge': as changes are being applied to the working
> copy, why should a delta ever happen?

To me it seems unavoidable. I can ask svn to merge something unrelated
enough that BASE doesn't make sense anywhere. Hypothetically, if I merge a
tiny change, for sake of argument something that doesn't even merge cleanly
onto my current branch, with or without local modifications in the WC, then
there's no fulltext anywhere that makes any sense at all, even over
ra_local. The only useful information is the delta itself. The Receiver has
to deal with the delta and has to determine whether it applies cleanly
(merge, and update when there are local mods), or has to print out the delta
1:1 (diff). It's the callback implementations that want to set the conflict
markers in case a delta doesn't apply cleanly. I don't see this stuff move
out to the 'special case' section as well.
And ...

> The working copy wants fulltext.
> Sure, 'svn patch' will be applying a patch, but there isn't really any
> delta needs anywhere in merge/patch. If the RA needs to request a file
> source, then it can certainly provide a BASE, get a delta, and
> generate a fulltext; but I'll note this RA behavior is operating
> outside of the Editor API anyways.
>
> So. For all these various reasons/scenarios, Ev2 operates with
> fulltext. If the *receiver* determines that a delta is needed, then it
> can do so. But that is outside the scope of the Ev2 system.
>
> Note that the receiver does need some kind of callback, arranged at
> editor construction time, in order to fetch BASE fulltexts so that it
> can produce a delta.

... I don't *want* to fetch fulltexts via callbacks, I want tree edits with
embeddable text edits. Maybe we need a separate callback for the cases where
receiving a pure delta makes more sense... hm.

>
>>> Also note that the content stream should typically lazy-load. The
>>> receiver may close it immediately because it has the contents already,
>>> as identified by the checksum.
>> Sorry, don't understand what you mean. lazy-load vs. close immediately?
>
> As a simple example, let's say that you are doing a commit. You have
> manually copied a versioned file (rather than 'svn copy'). At commit
> time, set_text() is called specifying this new file's checksum. The RA
> layer knows the server already has a file with that checksum (through
> some hand-wave magic irrelevant to the point here), and so the RA
> layer simply *closes* the file contents stream. It doesn't want to
> send the contents to the server, but just the checksum instead.

Okay, I see, thanks. Heh, better not use md5, then ;)
* neels ducks a flying shoe -- avoiding a collision

>
> In this scenario, you can optimize the client I/O by *waiting* to open
> the underlying fulltext file until the first read comes in. So for the
> case where a read never occurs (just a close), then the filesystem is
> never touched.
>
>>> On completion: if you don't complete something right away, then you need
>>> an additional signal from the driver that the node is (now) complete.
>>> You may also need to keep state on the incomplete node, and with no
>>> particular ordering, you must hold all of those states for an
>>> indeterminate time.
>> well, until the editor completes or aborts, at most. Why forbid this, given
>> an implementation does want to do that? (like, as we once hypothesized, a
>> conversion layer from editor v2 to v1 would have to?)
>
> If an implementation wants to do that, then fine. But omitting the
> completeness aspect means that all receivers *must* retain some notion
> of incompleteness for an indefinable time period.
>
> It simplifies the receives, and creates a *much* better resumability
> story, to specify all the children at add_directory() time.

In my API comments, I coined the 'must complete' as a 'Receiving
restriction'. Here you say "If an implementation wants to do that, then
fine" -- so I'd rather write 'should complete' instead of 'must'.

~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2395765

Received on 2009-09-17 01:11:07 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.