[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: add NODES.prior_deleted boolean column

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 23 Sep 2010 12:57:53 -0400

On Thu, Sep 23, 2010 at 09:19, Julian Foad <julian.foad_at_wandisco.com> wrote:
> On Wed, 2010-09-22, Greg Stein wrote:
>> > That flag would just mean "There is a row for the same path with a
>> > smaller op_depth and with a non-negative kind of presence", right?  So
>> > whether we actually store that flag is a matter of impl/efficiency, not
>> > of logical design.  Have I understood?
>> No. It means that this (relpath/op_depth) layer was created by a
>> deletion ("svn rm" or "svn mv").
> OK, that's fine.  What I'm trying to ascertain next is whether this
> information is already stored in the table without adding this extra
> column.  Let me try again, differently:

Sure, you could derive the column by examining layers underneath. The
semantics of add/copy/move is "not allowed if something exists", so
anything that was there has been deleted first.

The underlying question is "what is the query pattern? do we ask 'was
something deleted by this operation?' very often?". If the answer is
yes, then keep the flag. If we don't ask that question often, then run
the extra query looking for prior nodes.

I believe the answer is "often". A simple 'svn status' will need to
distinguish between 'A' and 'R', and that is done by checking

And no... this amount of denormalization will not hurt us.

>> op_depth > 0 can have the following presence values: added, deleted,
>> copied-here, moved-here, excluded, incomplete. prior_deleted may be 0
>> or 1. moved_to is a discriminator between deleted and moved-away.
> Great, this is good.
> Let's go into a bit more detail on some of the op_depth > 0 values.
> 'incomplete':
> Let's see where 'incomplete' can be used at op_depth > 0.  Primarily
> during a repo->WC or WC->WC copy, constructing the 'copied-here' dirs.
> Maybe also during a (WC->WC) move, constructing the 'moved-here' dirs,
> so that the whole subtree need not be moved in a single database
> operation.  If so, I'm wondering whether we need to distinguish
> copied-here-incomplete from moved-here-incomplete.  Or maybe we can
> always rely on the caller remembering what it's doing and setting the
> correct state afterwards.  I'm not sure.  If we start out assuming it
> refers to a copy-here, and later find that we do need to distinguish
> these, then I think we could do so at that time by introducing another
> value.

I don't think we need to discriminate *how* or *why* the incomplete
nodes were created... just that they are. And if find we need to ask
that question, then we can examine the parent node, or we can add new
presence values.

> 'excluded':
> I think we need to support 'excluded' at op_depth > 0 on a copied-here
> node (only for a child, not the root of the copy), and also on a
> moved-here node (child).  We should distinguish these.  How?

"should"? Why? Again, looking at the parent immediately tells us what
is going on. But I don't even see where/how/why we need to know this

> 'moved-here':
> I believe the idea of the 'move' in WC-NG is that it represents an
> atomic move.  We should consider how a move is to be represented, in
> full.
> A 'moved-here' row's 'moved_to' column points to the corresponding
> 'moved-away' row.  The 'source' of the move is the row that is shadowed
> by the 'moved-away' row.


A 'moved-here' node has repository source information given in the
repos_* columns.

A 'moved-away' node has presence='deleted' and moved_to set to the
destination local_relpath.

At the moment, we do not record the local_relpath of the source of a
copy/move. (it may be null for a repo->WC copy; I believe we would not
allow a repo->WC move)

> The destination of the move must always refer to the same repo node-rev
> as the source did.  Therefore its repo node-rev columns and stored
> (pristine) content columns (file checksum, link target, dir depth) could
> be either stored as duplicates of the source row values, or stored as
> null and always looked up via the source row when needed.

Duplicates. No need to make querying harder than it should be.

> One of the particular desired behaviours of a move is that 'update' will
> still update it.

So you say. I don't know that that is true.

After a WC-WC copy, we don't update the copied nodes (afaik), so I
don't see that this immediately follows.

>> On Wed, Sep 22, 2010 at 18:27, Greg Stein <gstein_at_gmail.com> wrote:
>> >...
>> > The children don't have to be touched. They just hang out in their
>> > deleted state with the same op_depth. We *never* want to modify a rows
>> > op_depth. That is part of its primary key(!).
>> Let me clarify this. There is one situation where we (effectively)
>> modify the op_depth.
>> Consider:
>> 1. checkout. add rows <A, 0>, <A/B, 0>, <A/B/C, 0>, <A/B/gamma, 0>:
>> presence=normal.
>> 2. delete gamma. add row <A/B/gamma, 3>: presence=deleted.
>> 3. delete B. add rows <A/B, 2>, <A/B/C, 2>: presence=deleted. modify
>> <A/B/gamma, 3>: op_depth=2
>> We subsume the child deletion into the parent deletion. We could just
>> as well revert/forget the child deletion and add rows for all children
>> at op_depth, or we can just rewrite the op_depth. Net is the same.
>> Outside of this deletion, I don't think we want to be modifying
>> op_depth at all.
> Except for reverting, where you can get the reverse of your last point.
> For example, follow your last step "3." with "4. revert B. delete row
> <A/B, 2>. modify rows <A/B/C, 2>, <A/B/gamma, 2>: op_depth = 3.

No. Step 3 subsumed the delete from step 2. All knowledge of it is
lost. A revert will delete the three rows at op_depth=2 and expose the
original checked-out rows.

Received on 2010-09-23 18:58:31 CEST

This is an archived mail posted to the Subversion Dev mailing list.