[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: more editor v2 (3)

From: Greg Stein <gstein_at_gmail.com>
Date: Wed, 16 Sep 2009 21:43:18 -0400

On Wed, Sep 16, 2009 at 19:07, Neels J Hofmeyr <neels_at_elego.de> wrote:
> Greg Stein wrote:
>...
>> 2) RA *receives* a delta from REPOS: the RA layer fetches BASE from
>> the WC, applies the delta, and passes the new fulltext to set_text()
>
> And RA told REPOS what the current BASE revnum is, right?
> BTW, is BASE always at one single revision across the WC? What if I had a
> few paths 'update -r123'd to a different revision?

Each path is paired with a revision, in order to support to a
mixed-revision working copy. There is a lot of optimization in the
client/server communication to representing those pairings, but
logically every path has its own revision number.

>...
> Now, I still have a problem with it, in the case of 'diff', and I think also
> 'merge':
>
> $ svn diff ^/A -c123

Woah. Careful here. 'svn diff' is not modifying anything, so an editor
"should not" be involved here. I think we do have one in there
somewhere, and/or some diff callback thingies, but the point of the
editor is to *modify a tree*.

That said, in order to discover the differences between r122 and r123,
we *could* serialize the operations needed to modify r122, send it
over the wire, and then the diff code uses that to report the changes.
I don't recall how diff works, to be honest.

> What makes most sense to me:
> - REPOS gets the two revs from FS and computes a delta.
> - RA transports the delta back to client
>
> ok, now what. Receiver gets a set_text() of delta applied to current BASE?
> nah. The Ev2 wire sends r123 in full and client has to fetch another
> fulltext r122 via RA to compute a delta client side?? Nah!

Nope.

When presented a situation like this, the client will do one of two things:

1) if a "good" BASE is present for the file in question (there's the
sticky point: do we have a local copy of *that* file?), then the
client asks the server "give me a text delta to construct r122 from
BASE", and then it says "give me another to construct r123". It then
applies the two deltas against BASE to generate the fulltexts.

2) if no "good" BASE exists, then it fetches a fulltext for r122, a
text delta for r122:123, and applies the textdelta to produce the
second fulltext.

And again: back up a bit: I'm not sure that an editor drive can/should
make sense for 'svn diff' so a set_text() might never happen (on the
client). In the absence of knowing what is happening, and given two
minutes thought, I would implement diff by generating (on the server)
a list of text-modified files and report those to the client. The
client would then start a series of appropriate requests to get the
fulltexts for those files, and then produce the (unified) diffs.

> (Similarly, merge wants to know a delta between things that aren't
> necessarily known on client side, s.b.)
>
> So for diff, I'd actually lose Ev2 and construct a separate diff/merge tree
> delta communication. Red alert!

I actually have no problem with it. One of my issues with Ev1 is that
it was used in inappropriate places. It was the hammer, and people saw
nails everywhere. Go look at revision_status.c one day. But maybe
smoke some weed first so you can blame the horrors you see on
hallucinogens.

>> I'll note that REPOS the server side is a bit looser with editors than
>> the client. Conceptually, you could imagine Ev2 implementing
>> set_text() as svn_fs_apply_text() when the FS is the receiver. When
>> the FS is the driver, it would send TARGET into set_text(), and let
>> REPOS fetch BASE *if needed*.
>>
>> That last "if needed" is the important point here. It is REPOS that
>> determines whether a delta is needed. NOT FS. In Ev1, the driver (the
>> FS) computed a delta and shoved it into the API. But if the client
>> does not have a BASE, then REPOS would have to (re)generate the
>> fulltext and send that down the wire. The delta computation was
>> totally useless.
>
> I see how that makes sense. But fetching a delta from FS vs. fetching a
> complete text from FS is not dependent on what's sent over the wire. I'd
> humbly suggest not plugging the editor directly into FS... ?

Oh, it is *very* dependent upon what gets sent over the wire. The
client requests REPOS "give me a fulltext" or "give me a textdelta
against $BASE". The REPOS then needs to form the reply in the most
optimal way possible. *If* the editor interface happens to be in this
control flow, then Ev1 would always compute and give REPOS a textdelta
which would require additional work to restore a fulltext for the
wire.

The Ev2 interface is actually better aligned with FS than Ev1. The FS
interface is a random access interface, without the ordering
requirements of Ev1. I don't recall when/were we use Ev1 on the server
side (ra_local certainly, and maybe the server side of ra_svn;
mod_dav_svn does not use a commit editor, though it might use an
editor in some of the report generation.

>> Now, consider libsvn_ra_local. No delta computation ever has to occur.
>> RA just writes the full text, which writes it into the FS. This stuff
>> is all linked together, so no need for computing/sending/applying
>> deltas exist. Just pass it through in-memory streams from the source
>> into the FS.
>
> Mostly yes, but don't forget 'svn diff' and 'svn merge'. (s.b.)

Those do not deal with text deltas, but rather unified diff and
embedded conflict markers. And in order to do *those* things, they use
fulltexts.

So again: within the client, you aren't going to slogging around text
deltas. The RA layer will, in order to optimize fulltext fetching from
the server, but that has very little to do with the editor APIs.

>> Consider 'svn import'. The RA networking layers can do a simple zlib
>> compression on the wire, sending the fulltext up to the server. It
>> *could* do a delta computation against the empty file for a bit of
>> self-compression, but the space savings are comparable, so it is
>> actually nicer to just go with standard compression (which helps with
>> intermediary servers, proxies, etc; they may try to compress one of
>> our delta-compressed files because they think none has occurred; so
>> those intermediaries would be doing senseless work; but if the request
>> was tagged as 'deflate', then all is good -- no extra compression and
>> tools can look at the fulltext if desired).
>>
>> Consider 'svn merge': as changes are being applied to the working
>> copy, why should a delta ever happen?
>
> To me it seems unavoidable. I can ask svn to merge something unrelated
> enough that BASE doesn't make sense anywhere. Hypothetically, if I merge a
> tiny change, for sake of argument something that doesn't even merge cleanly
> onto my current branch, with or without local modifications in the WC, then
> there's no fulltext anywhere that makes any sense at all, even over
> ra_local. The only useful information is the delta itself. The Receiver has
> to deal with the delta and has to determine whether it applies cleanly
> (merge, and update when there are local mods), or has to print out the delta
> 1:1 (diff). It's the callback implementations that want to set the conflict
> markers in case a delta doesn't apply cleanly. I don't see this stuff move
> out to the 'special case' section as well.

Hmm. I think the above paragraph is conflating unified diffs (or
"patch files") with the concept of a Subversion text delta.

> And ...
>
>> The working copy wants fulltext.
>> Sure, 'svn patch' will be applying a patch, but there isn't really any
>> delta needs anywhere in merge/patch. If the RA needs to request a file
>> source, then it can certainly provide a BASE, get a delta, and
>> generate a fulltext; but I'll note this RA behavior is operating
>> outside of the Editor API anyways.
>>
>> So. For all these various reasons/scenarios, Ev2 operates with
>> fulltext. If the *receiver* determines that a delta is needed, then it
>> can do so. But that is outside the scope of the Ev2 system.
>>
>> Note that the receiver does need some kind of callback, arranged at
>> editor construction time, in order to fetch BASE fulltexts so that it
>> can produce a delta.
>
> ... I don't *want* to fetch fulltexts via callbacks, I want tree edits with
> embeddable text edits. Maybe we need a separate callback for the cases where
> receiving a pure delta makes more sense... hm.

Are you talking about patches, or about svn text deltas?

Also, I do not think that 'svn merge' is purely an editor operation
since there are *three* sources, resulting in an output. The editor
interface is about transforming *one* source into a target state.

>...
>>>> On completion: if you don't complete something right away, then you need
>>>> an additional signal from the driver that the node is (now) complete.
>>>> You may also need to keep state on the incomplete node, and with no
>>>> particular ordering, you must hold all of those states for an
>>>> indeterminate time.
>>> well, until the editor completes or aborts, at most. Why forbid this, given
>>> an implementation does want to do that? (like, as we once hypothesized, a
>>> conversion layer from editor v2 to v1 would have to?)
>>
>> If an implementation wants to do that, then fine. But omitting the
>> completeness aspect means that all receivers *must* retain some notion
>> of incompleteness for an indefinable time period.
>>
>> It simplifies the receives, and creates a *much* better resumability
>> story, to specify all the children at add_directory() time.
>
> In my API comments, I coined the 'must complete' as a 'Receiving
> restriction'. Here you say "If an implementation wants to do that, then
> fine" -- so I'd rather write 'should complete' instead of 'must'.

No no... I meant "if an implementation wants to hold a gargantuan
amount of state, then it can". But it MUST complete the action.

(and yes, we can quibble on what "complete" means; for example, if the
Receiver held every operation in memory, then applied it all during
the complete() call, then for all appearances, each of those
operations completed)

The thing that particularly concerns me is setting up calling
dependencies and ordering across functions in the editor interface.
That can cause a lot of problems if somebody misses the requirement.
However, if each operation can be *completed* with no further work,
then we have a much simpler and less error-prone API.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2395782
Received on 2009-09-17 03:43:34 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.