Hyrum, Eric H., Philip M. and I met up at WANdisco's office in Sheffield

(England) yeterday and today. One of the things we discussed was

NODE_DATA.

We discussed several sub-topics, of which here are two. I've written

these up including my further thoughts which weren't part of the

discussion, so it's biased.

-----------------------

Observation:

The new NODE_DATA table can completely subsume the old BASE_NODE and

WORKING_NODE tables.

BASE_NODE => NODE_DATA(op_depth==0),

WORKING_NODE => NODE_DATA(op_depth==max).

A few of the columns do not make sense in every op_depth.

translated_size and last_mod_time make sense only on the top-most node.

But putting these columns into this table seems better than keeping them

in a separate table.

BASE_NODE NODE_DATA WORKING_NODE

---------------- -------------- --------------

Indexing => yes + op_depth <= Indexing

presence => yes <= presence

Node-Rev => yes: original_* <= copyfrom_*

Content => yes <= Content

Last-Change => yes <= Last-Change

translated_size => TODO <= translated_size

last_mod_time => TODO <= last_mod_time

file_external x no - obsolete

dav_cache => TODO

incomplete_children x no - obsolete

no - not ready? <= moved_here

no - not ready? <= moved_to

no - obsolete x keep_local

By "TODO" I mean not yet in wc-metadata.sql.

By "not ready?" I mean we're not ready to fully define and use the

'moved_*' columns so it would be better to insert them with a WC format

upgrade when we are ready.

-----------------------

[RFC] Instead of recording the "deleted" state as a presence value in

the topmost operative layer, consider whether to record it by means of a

flag in the next-nearest layer *beneath*.

This initially sounded appealing, but I'm not so sure now. It could

just be a premature optimisation side-track. It's certainly a way down

the list of Important Things.

Example sequence of operations, showing representation in both schemes:

Operation: clean checkout

WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1

------------ ---------- ---------- ---------- ----------

A1/ norm norm

+- f.old norm norm

+- f norm norm

B1/ norm norm

+- f norm norm

+- f.new norm norm

Operation: delete ./A1

WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1

------------ ---------- ---------- ---------- ----------

A1/ norm Del! norm base_deleted

+- f.old norm Del! norm base_deleted

+- f norm Del! norm base_deleted

B1/ norm norm

+- f norm norm

+- f.new norm norm

Operation: copy ^/B1 to ./A1, replacing ^/A1

WC paths op_depth=0 op_depth=1 op_depth=0 op_depth=1

------------ ---------- ---------- ---------- ----------

A1/ norm Del! norm norm norm

+- f.old norm Del! norm base_deleted

+- f norm Del! norm norm norm

+- f.new norm norm

B1/ norm norm

+- f norm norm

+- f.new norm norm

Flag in layer beneath doesn't require a final base_deleted row in the

topmost layer. The flag is required for (and only for) paths where a

row exists in the previous layer, so it seems more space-efficient to

store it there.

Flag in layer beneath allows simpler reverting of the "add" half of a

replacement: (remove all op_depth=N rows that are children of this op)

rather than (convert all these rows to a different presence value that

depends on their presence in layer N-1). That's not considered an

important UI feature, but the fact that it can be done by a logically

simple operation is likely to result in simpler, less buggy code.

Flag in layer beneath makes reverting the whole operation harder: two

layers need to be modified.

Flag in layer beneath is redundant when overridden by a higher layer, in

other words when a node is present in the WC at this path.

Flag in layer beneath copes with intentional deletion, but what about

'absent' and 'excluded' - it would be wise to be able to support them

too, and maybe we need 'not-present'? If so, it's a presence value

rather than a Boolean flag.

-----------------------

- Julian

Received on 2010-08-18 22:49:59 CEST