[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Editor v2 - suggestions and queries

From: Greg Stein <gstein_at_gmail.com>
Date: Sat, 5 Nov 2011 19:56:29 -0400

On Fri, Nov 4, 2011 at 11:16, Julian Foad <julian.foad_at_wandisco.com> wrote:
>...
>>> + *        internally respond to cancellation. If the driver then calls
>>> + *        'complete' without intercepting the cancellation, the user
>>> + *        would not be happy to see that commit being completed.
>>
>> That would be a bug in the add, which is not checking. I see no reason
>> to work around a bug.
>
> The 'add' method *should* be cancellable internally if it's taking a
> long time, of course.  My argument still applies if 'add' is doing
> cancellation checks internally but happens not to check during the
> last (say) one second before it returns.  I disagree that that would
> constitute a bug in the add.
>
>> If the caller wants to abort the editor, then it
>> should be able to do so, regardless of the cancellation setting.
>
> I agree that 'abort' should not be cancellable, but I'm talking about
> 'complete'.
>
> I understand it's a feature of the editor design that the driver side
> implementation can batch up the effects of editor calls, buffering the
> data and returning quickly from each of them, and then the 'complete'
> call can take a long time as it flushes the buffer and waits for all
> the transfer to complete.  In this way, the 'complete' call acts like
> a 'flush' followed by a 'close'.  Would we not expect the 'complete'
> function to check internally for cancellation while its 'flush' phase
> is proceeding?  It seems necessary that it should do so, if
> cancellation is to be effective for the user.
>
> As an aside, I found it helpful to think about what the driver is
> allowed to do when an editor function returns 'cancelled'.  "Unless it
> overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
> aborts the transmission by invoking the svn_editor_abort() callback."
> The driver should certainly be allowed to call 'abort', and that call
> should reach the receiver side if at all possible.  For a little time
> I wondered whether we also want the option for a driver to treat the
> cancellation request as a polite request from the user that the driver
> has now sent everything the (driver-side) user cares about and so the
> editor should be closed 'successfully' at this point.  A driver that
> is performing a commit should certainly not behave like that.
> Performing a WC update?  No, that would corrupt the WC.  Printing a
> diff to a remote user's screen?  Even in that hypothetical case,
> although it might be harmless, it's probably much more useful and
> correct to call 'abort' and thus discard any unsent output.  So I
> don't see a need for a driver to be able to call 'complete' after an
> editor call returns 'cancelled', nor can I see that being logically
> sound, given that some editor function may have returned early and
> left some state undefined.
>
> So one point from that is I don't see a way in which the driver should
> be allowed to 'override' the cancellation.  If that were allowed, that
> would imply either that cancellation is only checked between editor
> calls and not at arbitrary points inside them, which is (I think) not
> right, or that the driver has knowledge about the editor
> implementation.
>
> If you're with me so far, then I'd suggest that in terms of
> cancellation 'complete' is more similar to 'add' than it is similar to
> 'abort'.

Hmm. Some interesting points here. And with some ties into your responses later.

It seems that we want to specify some behavior here.. some ideals.
We're kind of getting into the territory that not all drivers will
obey, but we can simply hope. In this case, what will the driver do in
the face of desired cancellation? What do we want the editor API to
do? (can we enforce any of that within the interior editor code?)

For example, one thing we should enforce: an editor should be
completed, or aborted. In debug mode, we can register a pool cleanup
and verify this fact. (though none of our tests attempt cancellation,
so I don't know how useful this is)

I don't think we need to worry too much about whether complete()
checks for cancellation. You're just talking about a race condition
there.

It seems reasonable for a driver to get a cancellation error [from the
editor/receiver] and then wrap things up and call complete(). That
kind of behavior would be totally fine for an "update" drive. There
are no semantic problems with doing a half-update. In this kind of
atmosphere, the notion of "each receiver callback completes before
returning" is very important. Of course, if the receiver defers some
of the work, then the driver *might* be upset, but as long as the
receiver doesn't leave behind integrity problems, then we're all good.

>...
>>> + * ### JAF: The set_* functions treat the properties of a node as
>>> + *     separate from the node's text or symlink target, whereas the
>>> + *     add_* functions treat the properties as an integral part of each
>>> + *     kind of node. This seems like a needless difference. It would make
>>> + *     sense for set_text() to take the properties with the text, and let
>>
>> Sure, that makes sense. Then the set_props() can lose the COMPLETE
>> argument, and we add the restriction that set_text() and set_target()
>> MUST NOT be called before/after the set_props() call.
>>
>> We could also state that props==NULL means "no change". "No
>> properties" (the empty set) would be designated by an empty hash.
>
> Right; and similarly allow contents==NULL and target==NULL.

Hrm. See below.

>>> + *     the implementation worry about whether the props and/or the text
>>> + *     have actually changed (as well as how to transmit the changes
>>> + *     most efficiently). For symlinks, include the properties with the
>>> + *     'set target' call (rename to 'set symlink') and for directories
>>
>> Keep it set target. You are not setting the symlink, you are setting
>> the *target* of the symlink. This is the nomenclature used everywhere
>> already.
>
> Huh?  We're now talking about a single call that sets the target and
> the properties together.  If we take this approach, I suggest naming
> the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
> 'add_file'), and 'set_dir_props' (which is not quite like
> 'add_directory').

It was never intended to be set_dir_props(). You could set properties
on any node. That should stick around, cuz it kind of sucks for the
caller to have to know the node type just to set some properties.

But I do like where you're going with this. Rather than set_$kind,
let's go with alter_file(), alter_symlink(), and alter_directory(). I
chose "alter" rather than "change" since change_directory might throw
people off with some implied stateful semantics. Each of the alter_*
functions would provide for changing the properties, symlink target,
file contents, etc. Maybe we eliminate alter_directory() since that
would be exactly the same as set_props() [though maybe that becomes
alter_props?].

>...
>>> + * ### JAF: Nor followed by copy() or move() with the same path as the
>>> + *     copy/move source?
>>
>> You cannot copy/move a missing node.
>
> That sounds right for 'move', because I would assume the source node
> refers to the node as found in the current partially-edited state of
> the tree at the time of the call.
>
> For 'copy', I'm not clear what the semantics are.  Sometimes an edit
> driver might want to make a copy of relpath FOO as found in the
> current partially-edited state of the tree at the time of the call.
> At other times it may want to make a copy of relpath FOO as found in
> the 'old' tree that the edit is based on (not the partially-edited
> state).  How would an edit driver tell the receiver to make a copy
> from some arbitrary node in some (presumably older) committed revision
> that the receiver might or might not know about?

These are all very good concerns. But copy() already talks about a
source revision. I think a real question is how SRC_RELPATH is
specified in that function. Is that a repository relpath? (the
docstring doesn't clarify) Probably should be, since a revision is
specified, and (thus) implying something from a repository.

I do *not* think that copy() should be called for uncommitted items.
That is just a set of add_* calls which mirror the uncommitted source.

>...
>>> - * - svn_editor_delete() must not be used to move a path -- i.e.
>>> - *   svn_editor_delete() must not delete the source path of a previous
>>> + * - svn_editor_delete() should not be used to move a path -- i.e.
>>> + *   svn_editor_delete() should not delete the source path of a previous
>>
>> MUST NOT. Why the change?
>
> The interpretation was rather dependent on the reader's understanding
> of what 'to move' means.  How about this for a better phrasing of the
> whole paragraph:
>
>  - svn_editor_move() is the only way to describe a semantic move.  If
>    the desired semantics is one (or more) copies followed by a delete,
>    you may call svn_editor_delete() to delete the source path of a
>    previous svn_editor_copy() call, but that is not the same as a "move".

Sure.

>...
>>> + * ### JAF: This may be inefficient for an implementation that already
>>> + *     has a diff available but doesn't have a full text available
>>
>> Don't worry about this. In all cases that I identified, the base text
>> was available to apply a patch against to derive the desired new full
>> text.
>>
>> If you can identify a counter-example, then we can discuss. But
>> otherwise, this isn't an issue.
>
> It just seems to me that if I were writing a driver on the repository
> side I'd want to be able to transmit my exising svndiff streams, or
> combined svndiff streams, in the cases where I have them, and only
> resort to building full texts when necessary.  But I haven't looked at
> how the current repository code could do this in a concrete way.

As I've said: I investigated, and nobody *sources* svndiff streams.
Everybody has full text first, and then constructs diffs based on
context. So AFAIK, your hypothesis is false for our current codebase.
And I'm not willing to design for an arbitrary future, rather than our
past decade of experience.

>>>...
>>> + * ### JAF: For the driver: @a contents is a readable stream. The editor
>>> + *     implementation may pull text from it as required, and will then
>>> + *     close it before returning. The editor [### need not | will]
>>> + *     pull all the text from the stream before closing it.
>>
>> Somewhere in the doc, it should say that the receiver MAY close the
>> stream before reading anything [... or ...] all text from it [...].
>
> Good, will do.
>
> I wanted to clarify three separate things here.
>
>  (1) Partial read is allowed.  Good.
>
>  (2) It's a 'pull-mode' interface.  Fine.
>
>  (3) The editor is not allowed to return early and defer the reading
> of this stream until it's ready.  I wonder if we might want to let the
> editor keep several streams open and read from them as and when its
> transmit buffer allows, especially if it wants to be able to send two
> or more file streams in parallel.  These are just shallow thoughts at
> the moment.

Actually... damn. One of the reasons for a separate of set_props() vs
set_text() was to allow for the delayed delivery of contents. Same
thing for the add_file() and set_text(). We adjusted add_file() to
take contents, but that may have been a mistake.

At commit time, we want to delay the delivery of the content streams.

>...
>>> + * ### JAF: Is it permissible for the text change to be a no-op? [...]
>>
>> It is permissible, but stupid. I would be fine with saying "not
>> allowed". Or at least making a statement like "set_text() SHOULD make
>> changes to the contents."
>
> It may be 'stupid' if the driver already knows that the content is
> unchanged, but there may be cases where the driver wouldn't know
> except by scanning the stream, and for those cases it makes sense to
> allow it.

Right. That's why I think "SHOULD" is the appropriate term here.

> Now that we're talking about combining text & props into a single
> call, if we do that then I suggest that ==NULL is recommended but a
> stream with unchanged content is allowed.

Well... see above, ref: delayed content delivery. The API as I
originally designed provided for a delayed delivery. I forgot about
that aspect when I acceded to combining add_file/set_text.

Cheers,
-g
Received on 2011-11-06 00:57:02 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.