[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Editor v2 - suggestions and queries

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Fri, 4 Nov 2011 15:16:40 +0000

Hi Greg. Thanks for the detailed response.

First on one point from Hyrum:
Hyrum Wright wrote:
    @@ -698,57 +760,51 @@
     * Create a new directory at @a relpath. [...]
     *
    - * If this add is expected to replace a previously existing file or
    - * directory at @a relpath, the revision number of the node to be replaced
    + * If this add is expected to replace a previously existing node
> Use the more specific "directory" rather than "node". (This is a doctoring which only applies to directories.)

No, what we're talking about here is replacing some previous node. A
new directory should be able to replace any kind of previous node.

(Heh, at first I interpreted 'doctoring' as 'editing', but now I guess
'doc-string'.)

Greg Stein wrote:
> Julian Foad wrote:
>>...
>> @@ -177,18 +177,25 @@
>>  *    \n\n
>>  *    Just before each callback invocation is carried out, the @a cancel_func
>>  *    that was passed to svn_editor_create() is invoked to poll any
>>  *    external reasons to cancel the sequence of operations.  Unless it
>>  *    overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
>>  *    aborts the transmission by invoking the svn_editor_abort() callback.
>>  *    Exceptions to this are calls to svn_editor_complete() and
>>  *    svn_editor_abort(), which cannot be canceled externally.
>>  *
>> + *    ### JAF: It should check for cancellation before 'complete'.
>> + *        Imagine the user runs 'svn commit large-new-file' and then
>> + *        tries to cancel it, and imagine svn_editor_add() does not
>> + *        internally respond to cancellation. If the driver then calls
>> + *        'complete' without intercepting the cancellation, the user
>> + *        would not be happy to see that commit being completed.
>
> That would be a bug in the add, which is not checking. I see no reason
> to work around a bug.

The 'add' method *should* be cancellable internally if it's taking a
long time, of course. My argument still applies if 'add' is doing
cancellation checks internally but happens not to check during the
last (say) one second before it returns. I disagree that that would
constitute a bug in the add.

> If the caller wants to abort the editor, then it
> should be able to do so, regardless of the cancellation setting.

I agree that 'abort' should not be cancellable, but I'm talking about
'complete'.

I understand it's a feature of the editor design that the driver side
implementation can batch up the effects of editor calls, buffering the
data and returning quickly from each of them, and then the 'complete'
call can take a long time as it flushes the buffer and waits for all
the transfer to complete. In this way, the 'complete' call acts like
a 'flush' followed by a 'close'. Would we not expect the 'complete'
function to check internally for cancellation while its 'flush' phase
is proceeding? It seems necessary that it should do so, if
cancellation is to be effective for the user.

As an aside, I found it helpful to think about what the driver is
allowed to do when an editor function returns 'cancelled'. "Unless it
overrides the cancellation (denoted by #SVN_ERR_CANCELLED), the driver
aborts the transmission by invoking the svn_editor_abort() callback."
The driver should certainly be allowed to call 'abort', and that call
should reach the receiver side if at all possible. For a little time
I wondered whether we also want the option for a driver to treat the
cancellation request as a polite request from the user that the driver
has now sent everything the (driver-side) user cares about and so the
editor should be closed 'successfully' at this point. A driver that
is performing a commit should certainly not behave like that.
Performing a WC update? No, that would corrupt the WC. Printing a
diff to a remote user's screen? Even in that hypothetical case,
although it might be harmless, it's probably much more useful and
correct to call 'abort' and thus discard any unsent output. So I
don't see a need for a driver to be able to call 'complete' after an
editor call returns 'cancelled', nor can I see that being logically
sound, given that some editor function may have returned early and
left some state undefined.

So one point from that is I don't see a way in which the driver should
be allowed to 'override' the cancellation. If that were allowed, that
would imply either that cancellation is only checked between editor
calls and not at arbitrary points inside them, which is (I think) not
right, or that the driver has knowledge about the editor
implementation.

If you're with me so far, then I'd suggest that in terms of
cancellation 'complete' is more similar to 'add' than it is similar to
'abort'.

>>...
>> @@ -203,76 +210,111 @@
>>  *
>>  * <h3>Driving Order Restrictions</h3>
>>  * In order to reduce complexity of callback receivers, the editor callbacks
>>  * must be driven in adherence to these rules:
>>  *
>>  * - svn_editor_add_directory() -- Another svn_editor_add_*() call must
>>  *   follow for each child mentioned in the @a children argument of any
>>  *   svn_editor_add_directory() call.
>>  *
>> + *   ### JAF: ... must follow at any time after the parent add_directory(),
>> + *       and not before it.
>
> Huh? The statement already says the children must follow. I don't
> understand this seeming redundancy.

Just wanted to clarify it's *any time* after, making sure there's no
implication that the children must follow the parent before we mention
any path outside of the added tree, or anything like that. I'll add a
general note somewhere to that effect.

>>...
>>  *   In other words, if there are two calls coming in on the same path, the
>>  *   first of them has to be svn_editor_set_props().
>>  *
>> + * ### JAF: The set_* functions treat the properties of a node as
>> + *     separate from the node's text or symlink target, whereas the
>> + *     add_* functions treat the properties as an integral part of each
>> + *     kind of node. This seems like a needless difference. It would make
>> + *     sense for set_text() to take the properties with the text, and let
>
> Sure, that makes sense. Then the set_props() can lose the COMPLETE
> argument, and we add the restriction that set_text() and set_target()
> MUST NOT be called before/after the set_props() call.
>
> We could also state that props==NULL means "no change". "No
> properties" (the empty set) would be designated by an empty hash.

Right; and similarly allow contents==NULL and target==NULL.

>> + *     the implementation worry about whether the props and/or the text
>> + *     have actually changed (as well as how to transmit the changes
>> + *     most efficiently). For symlinks, include the properties with the
>> + *     'set target' call (rename to 'set symlink') and for directories
>
> Keep it set target. You are not setting the symlink, you are setting
> the *target* of the symlink. This is the nomenclature used everywhere
> already.

Huh? We're now talking about a single call that sets the target and
the properties together. If we take this approach, I suggest naming
the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
'add_file'), and 'set_dir_props' (which is not quite like
'add_directory').

>> + *     introduce a 'set directory' function, and remove the set_props()
>> + *     function. This would eliminate the exception in the "Once Rule".
>
> Sure.
>
>>...
>>  * - The ancestor of an added or modified node may not be deleted. The
>>  *   ancestor may not be moved (instead: perform the move, *then* the edits).
>> + *
>> + * ### JAF: Ancestor == parent dir?
>
> Any ancestor, not just the parent.

Right. (My doubt was about whether it could mean a time-wise rather
than path-wise 'ancestor'.)

>> + *
>> + * ### JAF: The ancestor of a node that is modified or added or copied-here
>> + *     or moved-here ...?
>
> Sure.

OK, I've updated the text.

>> + *
>> + * ### JAF: The ancestor of ... [can | may not?] be copied or moved.
>
> The statement says the ancestor may not be moved. Since it says
> nothing about copying: sure, that is allowed (as you'd expect, so we
> don't have to call it out).

If only we could always be so sure that anything not explictly stated
in our docs is not true! Seriously, not knowing how thoroughly the
text has been reviewed, I couldn't assume this with any certainty.

>>  * - svn_editor_delete() must not be used to replace a path -- i.e.
>>  *   svn_editor_delete() must not be followed by an svn_editor_add_*() on
>>  *   the same path, nor by an svn_editor_copy() or svn_editor_move() with
>>  *   the same path as the copy/move target.
>> + *
>> + * ### JAF: Nor followed by set_*().
>
> You cannot call set_*() on a missing node, so there is no reason to state this.

OK.

>> + * ### JAF: Nor followed by copy() or move() with the same path as the
>> + *     copy/move source?
>
> You cannot copy/move a missing node.

That sounds right for 'move', because I would assume the source node
refers to the node as found in the current partially-edited state of
the tree at the time of the call.

For 'copy', I'm not clear what the semantics are. Sometimes an edit
driver might want to make a copy of relpath FOO as found in the
current partially-edited state of the tree at the time of the call.
At other times it may want to make a copy of relpath FOO as found in
the 'old' tree that the edit is based on (not the partially-edited
state). How would an edit driver tell the receiver to make a copy
from some arbitrary node in some (presumably older) committed revision
that the receiver might or might not know about?

> The point of the above statement is to enforce that replacement is not
> allowed using a delete, followed by something that (re)constructs a
> node at that path. You don't need to (re)define the fact that set,
> copy, move must have a valid source.

OK.

>>  *   Instead of a prior delete call, the add/copy/move callbacks should be
>>  *   called with the @a replaces_rev argument set to the revision number of
>>  *   the node at this path that is being replaced.  Note that the path and
>>  *   revision number are the key to finding any other information about the
>>  *   replaced node, like node kind, etc.
>>  *   @todo say which function(s) to use.
>> + *
>> + * ### JAF: Can we have a way to refer to replacing a node in a tree that
>> + *     is not a committed revision and so doesn't have a revision number?
>> + *     This seems to be the only place where the editor definition requires
>> + *     that a target node is a (copy of a) committed repository node.
>
> Makes sense. We can state that SVN_INVALID_REVNUM means "uncommitted"
> (rather than "wildcard").

We'll have to think a bit deeper on this.

>>  *
>> - * - svn_editor_delete() must not be used to move a path -- i.e.
>> - *   svn_editor_delete() must not delete the source path of a previous
>> + * - svn_editor_delete() should not be used to move a path -- i.e.
>> + *   svn_editor_delete() should not delete the source path of a previous
>
> MUST NOT. Why the change?

The interpretation was rather dependent on the reader's understanding
of what 'to move' means. How about this for a better phrasing of the
whole paragraph:

  - svn_editor_move() is the only way to describe a semantic move. If
    the desired semantics is one (or more) copies followed by a delete,
    you may call svn_editor_delete() to delete the source path of a
    previous svn_editor_copy() call, but that is not the same as a "move".

>>...
>>  * - If any callback invocation returns with an error, the driver must
>>  *   invoke svn_editor_abort() and stop transmitting operations.
>> + *
>> + * - ### JAF: Some restriction analogous to the receiver's "All callbacks
>> + *       must complete their handling of a path before they return ..."?
>
> I don't see the analogy.

Oops, ignore me. I didn't mean the paragraph above about invoking
abort() is analogous, I meant do we need to specify anything on the
driver side about all editor functions needing to complete their
handling of a path before return. But I think that's basically what
all the many paragraphs above are already describing, so nothing more
to say here.

But now I think about it, I'm not clear on what "must complete their
handling of a path" really means for the receiver.

[...]
>>  * @todo ### TODO anything missing? -- allow text and prop change to follow
>> - * a move or copy. -- set_text() vs. apply_text_delta()? -- If a
>> + * a move or copy. -- set_text() vs. apply_text_delta()?
>> + *
>> + * ### JAF: Those two seem to be done already.
>
> Yup.
>
>> + *
>> + *  -- If a
>>  * set_props/set_text/set_target/copy/move/delete in a merge source is
>>  * applied to a different branch, which side will REVISION arguments reflect
>>  * and is there still a problem?
>> + *
>> + * ### JAF: The 'revision' argument will always refer to the 'before'
>> + *     state of the source node that the editor is editing. The receiver
>> + *     must know about this source node to make sense of the edit. If
>> + *     the receiver is merging the source-node edits into some other
>> + *     node on a different target branch, the editor knows nothing about
>> + *     that other node (which may not even have a revision number if
>> + *     it's a working version that's been modified already by some
>> + *     previous merge or user intervention).
>
> Right. svn_editor changes a tree from one state to another. "Merge"
> doesn't come into play whatsoever. "Make the tree look like <X>".
> Effectively, the driver computes the merge result, and then applies
> that to the target tree via svn_editor.

OK, I've removing these three notes.

>> + * ### JAF: As well as the text checksum for files, consider adding an
>> + *     (optional?) checksum for the full content of every node kind --
>> + *     text with props, symlink with props, dir with list of children.
>
> Huh? We've never talking about adding more checksums. I don't think
> this is the place to start.

OK, it was just a thought.

>>  /** Drive @a editor's #svn_editor_cb_add_absent_t callback.
>>  *
>>  * Create an "absent" node of kind @a kind at @a relpath. The immediate
>>  * parent of @a relpath is expected to exist.
>>  * ### TODO @todo explain "absent".
>> + * ### JAF: What are the allowed values of 'kind'?
>
> Not sure. My initial assumption was {dir, file, symlink}.
>
> Do we need to be able to set server-excluded, and excluded? I believe
> these latter two concepts are derived from the notion of "this $kind
> node should be marked absent", and it just depends on who receives
> such a notice. ie. the update receiver takes an absent marker as
> "server-excluded".

>>  /** Drive @a editor's #svn_editor_cb_set_text_t callback.
>>  *
>>  * Set/change the text content of a file at @a relpath to @a contents
>>  * with checksum @a checksum.
>>  * ### TODO @todo Does this send the *complete* content, always?
>> + * ### JAF: I think the idea is that 'contents' is the new full text and,
>> + *     if wanted, the driver and receiver implementations should diff and
>> + *     patch (respectively) it against the old full text.
>
> The CONTENTS param is always the new full text. Always.
>
> If the receiver is an RA committing editor, then it MAY choose to
> generate a diff to send to the server. But that is private to the
> receiver. If the driver is an RA update driver, then it MAY need to
> apply a diff received from the server against the (known) base text
> producing the new full text.
>
> No other formats should be used within the CONTENTS stream (e.g never
> an svndiff).

Yup, that's what I meant.

>> + * ### JAF: This may be inefficient for an implementation that already
>> + *     has a diff available but doesn't have a full text available
>
> Don't worry about this. In all cases that I identified, the base text
> was available to apply a patch against to derive the desired new full
> text.
>
> If you can identify a counter-example, then we can discuss. But
> otherwise, this isn't an issue.

It just seems to me that if I were writing a driver on the repository
side I'd want to be able to transmit my exising svndiff streams, or
combined svndiff streams, in the cases where I have them, and only
resort to building full texts when necessary. But I haven't looked at
how the current repository code could do this in a concrete way.

>>...
>> + * ### JAF: For the driver: @a contents is a readable stream. The editor
>> + *     implementation may pull text from it as required, and will then
>> + *     close it before returning. The editor [### need not | will]
>> + *     pull all the text from the stream before closing it.
>
> Somewhere in the doc, it should say that the receiver MAY close the
> stream before reading anything [... or ...] all text from it [...].

Good, will do.

I wanted to clarify three separate things here.

  (1) Partial read is allowed. Good.

  (2) It's a 'pull-mode' interface. Fine.

  (3) The editor is not allowed to return early and defer the reading
of this stream until it's ready. I wonder if we might want to let the
editor keep several streams open and read from them as and when its
transmit buffer allows, especially if it wants to be able to send two
or more file streams in parallel. These are just shallow thoughts at
the moment.

> [...] drivers may choose to implement a "lazy-open" stream. [...]

>> + * ### JAF: For the receiver: @a contents is a readable stream. The
>> + *     receiver may pull text from it as required, and will then
>> + *     close it before returning. The receiver [### need not | will]
>> + *     pull all the text from the stream before closing it.
>
> This is basically above. The editor framework will never touch the
> stream, so we're only talking about the receiver's callback here.

It's worth being clear about the same three points, as I don't see
that the receiver callback's rules must necessarily be the same as the
driver's.

>> + * ### JAF: Is it permissible for the text change to be a no-op? [...]
>
> It is permissible, but stupid. I would be fine with saying "not
> allowed". Or at least making a statement like "set_text() SHOULD make
> changes to the contents."

It may be 'stupid' if the driver already knows that the content is
unchanged, but there may be cases where the driver wouldn't know
except by scanning the stream, and for those cases it makes sense to
allow it.

Now that we're talking about combining text & props into a single
call, if we do that then I suggest that ==NULL is recommended but a
stream with unchanged content is allowed.

- Julian
Received on 2011-11-04 16:17:15 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.