Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)

From: Hyrum K Wright <hyrum.wright_at_wandisco.com>
Date: Tue, 31 Jan 2012 22:27:37 -0600

On Tue, Jan 31, 2012 at 10:08 PM, Greg Stein <gstein_at_gmail.com> wrote:
> I've included an email from back in November for the basis of the
> "next steps" for Ev2 tweaks. As I mentioned yesterday, Hyrum and I got
> a chance to talk at length about Ev2 issues and design. Some extra
> conversations with Philip and Julian also helped out here.
>
> These next steps feel Good, but have some big implications around
> client/server interaction. I'd like to hear any thoughts and concerns.
>
> See below:
>
> On Sat, Nov 5, 2011 at 19:56, Greg Stein <gstein_at_gmail.com> wrote:
>> On Fri, Nov 4, 2011 at 11:16, Julian Foad <julian.foad_at_wandisco.com> wrote:
>>...
>>> Huh? We're now talking about a single call that sets the target and
>>> the properties together. If we take this approach, I suggest naming
>>> the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
>>> 'add_file'), and 'set_dir_props' (which is not quite like
>>> 'add_directory').
>>
>> It was never intended to be set_dir_props(). You could set properties
>> on any node. That should stick around, cuz it kind of sucks for the
>> caller to have to know the node type just to set some properties.
>>
>> But I do like where you're going with this. Rather than set_$kind,
>> let's go with alter_file(), alter_symlink(), and alter_directory(). I
>> chose "alter" rather than "change" since change_directory might throw
>> people off with some implied stateful semantics. Each of the alter_*
>> functions would provide for changing the properties, symlink target,
>> file contents, etc. Maybe we eliminate alter_directory() since that
>> would be exactly the same as set_props() [though maybe that becomes
>> alter_props?].
>
> I plan to eliminate set_props() and its COMPLETE parameter. Instead,
> there will be three APIs for changing a node, and they will complete
> before returning. Thus, no more dual calls where a receiver may need
> to retain state to accomplish the entire change. The receiver can
> perform the change to the node, then throw out any temporary state.
> The node won't be touched again.
>
> These entry points will be:
>
> alter_directory(RELPATH, REVISION, PROPS)
> alter_file(RELPATH, REVISION, PROPS, CHECKSUM, CONTENTS)
> alter_symlink(RELPATH, REVISION, PROPS, TARGET)
>
> Again, we use "alter" to avoid change_directory to avoid confusion
> with the standard "chdir()" posix call.
>
> set_props, set_text, and set_target all go away.
>
> One more detail about add_file() and alter_file() is noted below.
>
>>...
>>> I wanted to clarify three separate things here.
>>>
>>> (1) Partial read is allowed. Good.
>>>
>>> (2) It's a 'pull-mode' interface. Fine.
>>>
>>> (3) The editor is not allowed to return early and defer the reading
>>> of this stream until it's ready. I wonder if we might want to let the
>>> editor keep several streams open and read from them as and when its
>>> transmit buffer allows, especially if it wants to be able to send two
>>> or more file streams in parallel. These are just shallow thoughts at
>>> the moment.
>>
>> Actually... damn. One of the reasons for a separate of set_props() vs
>> set_text() was to allow for the delayed delivery of contents. Same
>> thing for the add_file() and set_text(). We adjusted add_file() to
>> take contents, but that may have been a mistake.
>>
>> At commit time, we want to delay the delivery of the content streams.
>
> Actually, this will be fine. Hyrum and I figured out a different
> (better?) approach. This allows the contents to always be provided at
> add_file() and alter_file() time, rather than allowing a delayed
> delivery. As noted above, two-step interfaces and delayed calls can
> make it difficult for a receiver -- they need to retain some state
> about the node and link up the two calls to complete the
> addition/change.
>
>>...
>> Well... see above, ref: delayed content delivery. The API as I
>> originally designed provided for a delayed delivery. I forgot about
>> that aspect when I acceded to combining add_file/set_text.
>
> Again, we will leave it as-is, and alter_file() will also contain the contents.
>
> When Hyrum and I went through this, there are two occasions when file
> contents are delayed:
>
> 1) at commit time, we note all the changes that will be made, expect a
> "fast-fail" from the server for out-of-date items, and then we start
> delivery of the bulk/file content
>
> 2) at update time, the changes to a file may be noted in the update
> report skeleton, applied via the editor, and then a separate GET is
> run to fetch the contents and then set the contents when it arrives
> (via a delayed apply_textdelta). (Note: for Neon, with its
> mother-report approach, the content is typically present at the time
> of the file's metadata changes)
>
> For problem #2, we will simply make the RA update process (as an Ev2
> driver) manage the delayed state, rather than impose the burden upon
> all Ev2 receivers.
>
> For problem #1, it gets trickier. Hyrum noted that the delayed content
> delivery exists *only* so that we can get the fast-fail on the commit
> process, and then suggested: why don't we simply tell the server the
> entire commit plan, get the response, and *then* start sending all the
> changes to the server?
>
> Thus: we propose to turn the commit process into two parts, and
> corresponding RA interfaces:
>
> Step 1: The commit process tells RA something like "here are all the
> relpaths/revisions that I plan to $operation". Note that it isn't
> really "all paths" since recursive operations like a copy-destination
> path, or a deletion, don't need to list all the child nodes. This
> "plan" is sent to the server, which starts a txn, and examines whether
> any of the operations are being applied to out-of-date nodes. The
> server can allow the commit operation to proceed, or respond with an
> error (possibly, multiple errors!).
>
> Step 2: The commit process then drives an Ev2 commit editor to send
> all the changes to the server. This is a blend of metadata changes and
> content delivery (no delayed content, as before).
>
> The "plan" is some new XML report-like document, posted to the "me"
> resource on the server to create the FS txn and perform the check. I'm
> not sure what the schema looks like, what kinds of data items are
> needed, nor what the RA API looks like. This "plan" is probably an
> opaque object constructed by the commit process. It would be nice to
> have this in libsvn_ra, and the internals available to all RA layers.
> This plan object may be able to replace the "commit info" stuff that
> we have in the client today (preferable).
>
> For backwards compatibility, the RA API still needs to provide for the
> old commit process. That *may* be mappable to the new server "plan"
> protocol, but I'm not sure. The RA layer may need to retain too much
> information in memory (specifically: properties), until the
> apply_textdelta calls arrive with the content (the first one signaling
> the end of the Step 1, and the beginning of Step 2). It can probably
> use the Ev1 interface calls to construct a plan, and then use the new
> "send-plan" machinery at the transition stage. This compatibility code
> may be able to live in libsvn_ra and be implemented in terms of the
> new RA APIs (plan + Ev2). Thus, all RA layers may be able to get rid
> of their Ev1 code and just implement plan+Ev2.
>
> A new client talking to an old server would simply make Step 1 (the
> plan) perform all the working resource checkouts, which is how an old
> server performs the out-of-date checks. When Step 2 is run, the
> changes to the working resources are performed.
>
> .... okay. That's my brain dump for now.
>
> Thoughts?

I've had a few since last week, so I'll throw 'em out there.

One of the purposes the "commit plan" could accomplish is establishing
a priori what content the server has, and what content we need to
send, identified via sha1. Thus, if a commit is something like a
merge, where most of the content is already on the server, the client
doesn't have to transmit that information again. (Though maybe this
is already served by the lazy-load semantics of the Ev2 contents
streams.)

Another (complimentary?) approach might be to totally divorce content
from structure. Just remove contents streams from Ev2 completely.
We'd still have the checksums, and so Ev2 would still be making edits
to trees in a well-defined way, we'd just be leaving the actual
transmittal of the checksum to some other process, which may or
may-not be in band. In this paradigm, Ev2 goes back to being the
"commit plan", and we still get post-fix text deltas. Though, given
our current filesystem limitations, this design might not be doable.

(All this, of course, makes my backward compat shims even more
heinous, but as long as we're redesigning The World, let's look at it
from as many angles as reasonable.)

-Hyrum

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Received on 2012-02-01 07:15:11 CET

This message: [ Message body ]
Next message: Greg Stein: "Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)"
Previous message: Greg Stein: "Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)"
Maybe in reply to: Greg Stein: "Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)"
Next in thread: Greg Stein: "Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)"
Reply: Greg Stein: "Re: Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]