Re: add NODES.prior_deleted boolean column

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Thu, 23 Sep 2010 14:19:35 +0100

On Wed, 2010-09-22, Greg Stein wrote:
> On Tue, Sep 21, 2010 at 13:41, Julian Foad wrote:
> > Greg Stein wrote:
> >> After working through the several email messages, and discussion, I
> >> believe we're now down to a simple change:
> >>
> >> * add a "prior_deleted" flag to NODES
> >>
> >> The flag simply means that a node exists prior to this layer and has
> >> been deleted or moved-away. The 'presence' column may say the same
> >> thing, but it might also describe data that is replacing the
> >> deletion/move.
> >
> > Do you see this working in conjunction with the current set of presence
> > values, or your proposed extended set?
>
> The extended set. The "current set" is dead to me.

OK.

> > That flag would just mean "There is a row for the same path with a
> > smaller op_depth and with a non-negative kind of presence", right? So
> > whether we actually store that flag is a matter of impl/efficiency, not
> > of logical design. Have I understood?
>
> No. It means that this (relpath/op_depth) layer was created by a
> deletion ("svn rm" or "svn mv").

OK, that's fine. What I'm trying to ascertain next is whether this
information is already stored in the table without adding this extra
column. Let me try again, differently:

If this (relpath/op_depth) row shadows another row, and the shadowed
row's presence is not 'deleted', then it's either deleting or replacing
something. So prior_deleted = TRUE.

If it shadows a row that is 'deleted', then this row must be creating
a replacement: it doesn't need to delete first, and it can't be just a
delete. So prior_delete = FALSE.

If it doesn't shadow another row, then it's creating a new node
(add/cp/mv-here). So prior_delete = FALSE.

So prior_deleted == "this row shadows another row, and that shadowed row
is not 'deleted'".

That works for all the cases I can hold in my head. Maybe I've missed
some cases involving switched nodes, or mixed revs, or something, that
break that rule.

> Additional operations (add, copy-here, move-here) might alter this
> layer, but it started with the deletion of a prior node/subtree.
>
> > The subject that this arose from was how to store all the possible
> > states of a working row. First I want to know what are all the possible
> > states of a working row that we need to represent, before we decide how
> > to represent them. I don't think we have ever written them down yet, in
> > full detail, so I have tried.
> >
> > Please see the two tables in the "nodes-states" document that I am
> > attaching as .ods and as .pdf, and as two .png images. I'm not sure
> > whether any of the attachments will get through to the list.
>
> These tables are so much more complicated that it is very hard to
> read/understand them. It is billowing the combinatorics of what seems
> pretty darned simple.

I'd be more than happy to see the rules written in some other form. I'm
just trying to get the design spelled out precisely, it doesn't matter
to me whether that includes this kind of table or not. But at this
stage, thinking about each of the combinations is helpful to me.

> op_depth==0 can have the following presence values: normal,
> not-present, excluded, incomplete, unauthz. prior_deleted is always 0
> (FALSE).
>
> op_depth > 0 can have the following presence values: added, deleted,
> copied-here, moved-here, excluded, incomplete. prior_deleted may be 0
> or 1. moved_to is a discriminator between deleted and moved-away.

Great, this is good.

Let's go into a bit more detail on some of the op_depth > 0 values.

'incomplete':

Let's see where 'incomplete' can be used at op_depth > 0. Primarily
during a repo->WC or WC->WC copy, constructing the 'copied-here' dirs.
Maybe also during a (WC->WC) move, constructing the 'moved-here' dirs,
so that the whole subtree need not be moved in a single database
operation. If so, I'm wondering whether we need to distinguish
copied-here-incomplete from moved-here-incomplete. Or maybe we can
always rely on the caller remembering what it's doing and setting the
correct state afterwards. I'm not sure. If we start out assuming it
refers to a copy-here, and later find that we do need to distinguish
these, then I think we could do so at that time by introducing another
value.

'excluded':

I think we need to support 'excluded' at op_depth > 0 on a copied-here
node (only for a child, not the root of the copy), and also on a
moved-here node (child). We should distinguish these. How?

* Call them 'copied-here-excluded' and 'moved-here-excluded'.

* Make 'excluded' a separate flag.

* Go back to distinguishing 'move' from 'copy' by means of a flag
instead of difference 'presence' values.

'moved-here':

I believe the idea of the 'move' in WC-NG is that it represents an
atomic move. We should consider how a move is to be represented, in
full.

A 'moved-here' row's 'moved_to' column points to the corresponding
'moved-away' row. The 'source' of the move is the row that is shadowed
by the 'moved-away' row.

The destination of the move must always refer to the same repo node-rev
as the source did. Therefore its repo node-rev columns and stored
(pristine) content columns (file checksum, link target, dir depth) could
be either stored as duplicates of the source row values, or stored as
null and always looked up via the source row when needed.

One of the particular desired behaviours of a move is that 'update' will
still update it. If we keep the idea that 'update' just updates the
BASE layer, we should design it so that the 'moved-here' row has null
for its node-rev columns, and the APIs that look at a working layer row
should look up the values via the move-source row when needed.

Alternatively, if we duplicate those columns then 'update' will have to
follow moves and adjust the 'moved-here' columns to keep them in sync.

> >...
> > I assume this is in conjunction with the current set of presence values,
> > not your proposed extended set. So the possible changes would be
> > encoded as:
> >
>
> Did you omit something here?

Oops, I meant to delete that paragraph.

> (I got the email as you sent it directly to me; no missing
> attachments; it seems like above is missing something tho)
>
> >> When a deletion (of a subtree) occurs, then we create a new layer at
> >> <relpath, op_depth>. New rows are written for the root, and all
> >> children, using that op_depth value. If this is a moved-away, then we
> >> store the destination into moved_to at the root *only* (which can then
> >> later discriminate between the two types of deletions; children need
> >> to look to the root to discriminate; I bet this need is rare). Note
> >> that the deletion process needs to look for mods to descendents:
> >> deletes are integrated into this one; other operations may error with
> >> "can't delete local mods" or somesuch.
> >>
> >> For the following actions, these are applied to the root of a deletion:
> >
> > What do you mean "these are applied to the root of a deletion"? I guess
> > "add", "copy-here", "move-here" can only be applied to the root of a
> > deletion or to an unversioned/not-present path; is that it?
>
> Correct. You cannot add/copy-here/move-here to a descendent of the
> root of a deletion -- the root is still missing, so you have no parent
> for the new nodes.
>
> >> If an add occurs, then the root is updated to set presence='added'. No
> >> other changes are needed.
> >
> > Apart from setting the new node kind. And apart from changing the
> > op_depth of all its still-deleted children to obey the deletion-op-depth
> > rule:
> >
> > checkout: (A/B, A/B/C, A/B/gamma),op_d=0,normal
> >
> > delete A/B: add rows (A/B, A/B/C, A/B/gamma),op_d=2,deleted
>
> clarify as: presence=deleted. prior_deleted=TRUE.
>
> > add new file B: modify row (A/B),op_d=2:
> > presence/status := deleted+added
> > kind := file
>
> Yes.
>
> > modify rows (A/B/C, A/B/gamma),op_d=2:
> > op_d := 3
>
> No.
>
> The children don't have to be touched. They just hang out in their
> deleted state with the same op_depth.

Oh yes, my mistake.

> We *never* want to modify a rows
> op_depth. That is part of its primary key(!).
>
> If you changed your operation above to: add subtree B, B/C, B/gamma,
> then you would have:
>
> 1. modify <A/B, 2>: presence=added. (and kind=dir)
> 2. add <A/B/C, 3>: presence=added. kind=dir.
> 3. add <A/B/gamma, 3>: presence=added. kind=file.
>
> The B/C and B/gamma nodes are at a higher op_depth, so they are now
> visible instead of the deleted nodes.

> On Wed, Sep 22, 2010 at 18:27, Greg Stein <gstein_at_gmail.com> wrote:
> >...
> > The children don't have to be touched. They just hang out in their
> > deleted state with the same op_depth. We *never* want to modify a rows
> > op_depth. That is part of its primary key(!).
>
> Let me clarify this. There is one situation where we (effectively)
> modify the op_depth.
>
> Consider:
>
> 1. checkout. add rows <A, 0>, <A/B, 0>, <A/B/C, 0>, <A/B/gamma, 0>:
> presence=normal.
> 2. delete gamma. add row <A/B/gamma, 3>: presence=deleted.
> 3. delete B. add rows <A/B, 2>, <A/B/C, 2>: presence=deleted. modify
> <A/B/gamma, 3>: op_depth=2
>
> We subsume the child deletion into the parent deletion. We could just
> as well revert/forget the child deletion and add rows for all children
> at op_depth, or we can just rewrite the op_depth. Net is the same.
>
>
> Outside of this deletion, I don't think we want to be modifying
> op_depth at all.

Except for reverting, where you can get the reverse of your last point.
For example, follow your last step "3." with "4. revert B. delete row
<A/B, 2>. modify rows <A/B/C, 2>, <A/B/gamma, 2>: op_depth = 3.

- Julian
Received on 2010-09-23 15:25:37 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: more benchmarks: comparing runtime of 1.6.x vs. trunk aka 1.7 / wc-ng"
Previous message: Neels J Hofmeyr: "more benchmarks: comparing runtime of 1.6.x vs. trunk aka 1.7 / wc-ng"
In reply to: Greg Stein: "Re: add NODES.prior_deleted boolean column"
Next in thread: Greg Stein: "Re: add NODES.prior_deleted boolean column"
Reply: Greg Stein: "Re: add NODES.prior_deleted boolean column"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]