Re: Defining atomic "replace"

From: Greg Stein <gstein_at_gmail.com>
Date: Tue, 11 Aug 2009 22:04:09 +0200

In short: sounds great.

On Tue, Aug 11, 2009 at 20:51, Neels J Hofmeyr<neels_at_elego.de> wrote:
>...
> Currently, wherever that happens, the *receiving* side of the editor
> callbacks has to figure out whether a delete coincides with an add at the
> same path. That involves caching deleted nodes == bloat and complication.

Agreed.

And I also agree that we can define a requirement upon drivers'
pattern of operation on the API.

I've also speculated on having the editor.c code implement a debug
mode that verifies all constraints. Since *all* calls on the editor v2
must pass through those functions, we can implement all the semantic
checks. For example, we can put a hash table into the editor structure
and record all the deletes, then verify that an add/copy/move never
arrives for the same path.

> Fact is, there is at least one *driver* of the editor callbacks that has
> already made the correlation of a delete followed by an add, but splits it
> up into a delete followed by an add, following callback design.
>
> My crusade is to move the work for combining delete() + add() into a single
> "replace" into the *driver* of the new callbacks, considering that that
> information may already be lying around for free in most cases.

Right. This is also why I changed the model to "present all children
at directory creation time". This is always known, as far as I'm
aware. And specifying it up front has a number of advantages (as
stated in editor-v2.txt).

> Even if it isn't, I think it is better to make the *driver* do the
> combination work "once", and not burden "all those receivers" with the work
> of correlating deletes with adds.
>
> The problem I ran into is this:
>
> To effectively take the burden off the receivers, we need to *enforce* that
> the driver actually combines delete + add = replace. If the driver had a
> choice to combine or not combine "replace" calls, all those receivers have
> to be able to handle *both* cases, and the whole plan backfires, adding more
> code instead of simplifying.

Yup. And as I said above: we can have a debug mode to validate proper
behaviors from all drivers.

> I will now try to theoretically home in on a Good API definition around
> atomic replace.
>
>
> Aspects:
>
>
> (*) Receivers should not have to combine "replace" operations.
>
> --> We need to *disallow* issuing a delete() followed by an add() operation
> on the same path. The drivers all need to use the atomic-replace channel
> instead, which is to issue an add() operation with a REPLACES_REV arg.

Agreed.

> (*) Two-sided vs. detailed history
> In current merge (and all other operations AFAIK), if a certain path was
> deleted and added multiple times in a given range, all the intermediate
> steps are dropped and the only things that matter are
> Â - the initial state and
> Â - the final state.
> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
> However, there are rumors that at some point all the intermediate steps will
> be necessary (keyword "operational merge").
>
> --> We need to allow multiple atomic replaces in a given range, so that each
> sub-sequence of (D, A) becomes a (R):
> Â (D, A, M, D, A) becomes (R, M, R) in an operational merge.

I'm not sure that I understand this, but agree with the restriction of
"must combine".

But this does seem to raise a new question: you're talking about an
extra operation here. Some kind of "merge" operation. Does that need
to be present in the new editor API?

Note: if the answer is "unsure; we'll know later", then I consider
that fine, too. This v2 API is much more extensible, so I believe that
we can add more "callbacks" without a much problem.

> (*) Atomic move
> In the new editor callbacks, there is a separate move() call to represent
> atomic moves. It is thus possible to issue a delete() followed by a move()
> to the same path that was deleted. Which is, essentially, a replace!
>
> --> We need to also add a REPLACES_REV argument to the move() callback, and
> we need to disallow issuing a delete followed by a move to the same path.

Agreed.

> Actually, has this case ever been considered? Currently, a move() becomes an
> "add-with-history" plus a delete(). Our API doesn't indicate a move
> replacing a path. We could need another update/status letter for this.

There are LOTS of APIs in our system that do not understand moves. The
new wc_db can record moves, but (in 1.7) nothing is going to *tell* it
to do that. This new editor API can describe moves, but nothing will
drive it that way. Or maybe it will, and the receiver will break it
down to a delete/add to deal with legacy APIs that do not have a move
operation.

I expect that, over time, we will start propagating a "move" concept
further and further through our codebase. Only at legacy points (eg.
speaking to an old server) will they be broken into a delete/add pair.

> (*) Copy
> There also is a new copy() call instead of the "add(copyfrom_revision)". I
> guess it's a bikeshed whether the copy() callback should be separate instead
> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.

It isn't a bikeshed. There are *four* add variants, based on the kind
of the node you're adding. There is only *one* copy callback because
the node's kind does not magically change during the copy process.
Adding a copy revision/path pair of arguments to all four "add"
variants increases complexity, I think. I also believe that newcomers
have a harder time dealing with the "copy == add with history"
concept.

> Like with moves, it is possible to send a delete() followed by a copy() to
> the same path, which amounts to yet another "replace" variant.
>
> --> We need to also add a REPLACES_REV argument to the copy() callback, and
> disallow issuing a delete followed by a copy to the same path.

Agreed.

> (*) Different meanings
> It is conceivable that one single "replace" call may have a different
> meaning from separate delete() and add() calls. That would be in the way
> that the one type represents a closer relation than the other.
>
> --> My conclusion is that this is not the case / does not matter. An atomic
> replace is simply a new node taking the path of another node, the two of
> them being unrelated. There *are* no less-related cases than a "replace"
> operation (on the same path). There are more-related cases than a "replace",
> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I

Agreed.

> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
> the added node is related to the node that is being replaced, heavens
> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
> and REPLACES_REV == SRC_REVISION.)

Eh? Wouldn't this just become a no-op?

>...

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2382655
Received on 2009-08-11 22:04:51 CEST

This message: [ Message body ]
Next message: Ben Collins-Sussman: "Re: ra_svn replay-range + svnsync + mod_dav_svn = corrupted revisions!"
Previous message: Neels Janosch Hofmeyr: "Re: Defining atomic "replace""
In reply to: Neels Janosch Hofmeyr: "Re: Defining atomic "replace""
Next in thread: Bill Tutt: "Re: Defining atomic "replace""
Reply: Bill Tutt: "Re: Defining atomic "replace""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]