Re: NODE_DATA (2nd iteration)

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 03 Aug 2010 14:18:31 +0100

On Mon, 2010-07-12, Erik Huelsmann wrote:
> After lots of discussion regarding the way NODE_DATA/4th tree should
> be working, I'm now ready to post a summary of the progress. In my
> last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I
> stated why we need this; this post is about the conclusion of what
> needs to happen. Also included are the first steps there.
>
>
> With the advent of NODE_DATA, we distinguish node values specifically
> related to BASE nodes, those specifically related to "current" WORKING
> nodes and those which are to be maintained for multiple levels of
> WORKING nodes (not only the "current" view) (the latter category is
> most often also shared with BASE).
>
> The respective tables will hold the columns shown below.
>
>
> -------------------------
> TABLE WORKING_NODE (
> wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
> local_relpath TEXT NOT NULL,
> parent_relpath TEXT,
> moved_here INTEGER,
> moved_to TEXT,
> original_repos_id INTEGER REFERENCES REPOSITORY (id),
> original_repos_path TEXT,
> original_revnum INTEGER,
> translated_size INTEGER,
> last_mod_time INTEGER, /* an APR date/time (usec since 1970) */
> keep_local INTEGER,
>
> PRIMARY KEY (wc_id, local_relpath)
> );
>
> CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath);
> --------------------------------
>
> The moved_* and original_* columns are typical examples of "WORKING
> fields only maintained for the visible WORKING nodes": the original_*
> and moved_* fields are inherited from the operation root by all
> children part of the operation. The operation root will be the visible
> change on its own level, meaning it'll have rows both in the
> WORKING_NODE and NODE_DATA tables. The fact that these columns are not
> in the WORKING_NODE table means that tree changes are not preserved
> accros overlapping changes. This is fully compatible with what we do
> today: changes to higher levels destroy changes to lower levels.
>
> The translated_size and last_mod_time columns exist in WORKING_NODE
> and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that
> they exist in BASE_NODE is a bit of a hack: it's to prevent creation
> of WORKING_NODE data for every file which has keyword expansion or eol
> translation properties set: these columns serve only to optimize
> working copy scanning for changes and as such only relate to the
> visible WORKING_NODEs.
>

Can we come up with an English description of what each table will now
represent?

"The BASE_NODE table lists the existing node-revs in the repository that
comprise the mixed-revision tree that was most recently updated/switched
to or checked out. (The kind and content of these nodes is not here;
see the NODE_DATA table.)"

> TABLE BASE_NODE (
> wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
> local_relpath TEXT NOT NULL,
> repos_id INTEGER REFERENCES REPOSITORY (id),
> repos_relpath TEXT,

We need a revision number column here to go along with repos_id and
relpath to make a valid node-rev reference, don't we?

> parent_relpath TEXT,

(While we're reorganising, can we move that "parent_relpath" column to
adjacent to "local_relpath"?)

> translated_size INTEGER,
> last_mod_time INTEGER, /* an APR date/time (usec since 1970) */
> dav_cache BLOB,
> incomplete_children INTEGER,
> file_external TEXT,
>
> PRIMARY KEY (wc_id, local_relpath)
> );
>

"The NODE_DATA table records the kind and shallow content (props, text,
link target) of each node in the WC. It includes both the nodes that
comprise the currently 'visible' (or 'actual' or 'on-disk') state of the
WC and also all nodes that are part of a copied or moved tree but
currently shadowed by a replacement performed inside that tree.

At least one row exists for each WC path, including paths with no change
and all paths affected by a tree change (add, delete, etc.). If the
same path is affected by multiple levels of tree change - a replacement
inside a copied directory, for example - then multiple rows exist with
different 'op_depth' values."

> TABLE NODE_DATA (
> wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
> local_relpath TEXT NOT NULL,
> op_depth INTEGER NOT NULL,
> presence TEXT NOT NULL,
> kind TEXT NOT NULL,
> checksum TEXT,
> changed_rev INTEGER,
> changed_date INTEGER, /* an APR date/time (usec since 1970) */
> changed_author TEXT,

The changed_* columns can only belong to a node-rev that exists in the
repository. What node-rev do they belong to and why aren't they
alongside the node-rev details?

(The changed_* columns convey essentially a rev number and two of the
rev-props associated with that revnum that can be used in keyword
expansions. We should consider representing that information in a more
general form, both to avoid tying the DB format to the choice of those
two particular revprops, and to avoid the redundancy of storing these
same data and author values N times.)

> depth TEXT,
> symlink_target TEXT,
> properties BLOB,

(While we're rearranging, can we group the node-content fields together:
kind, properties, checksum, symlink_target?)

> PRIMARY KEY (wc_id, local_relpath, oproot)

s/oproot/op_depth/?

> );
>
> CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath);
>
>
> Which leaves the NODE_DATA structure above. The op_depth column
> contains the depth of the node - relative to the wc root - on which
> the operation was run which caused the creation of the given NODE_DATA
> node. In the final scheme (based on single-db), the value will be 0
> for base and a positive integer for WORKING related data.

Let's assume single-db. By the last sentence, I understand: For each
BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0;
for every node brought in by a tree operation (copy, move, add) to an
immediate child of the WC root there is a NODE_DATA row with 'op_root' =
1; for every child of a child ... 2; and so on.

- Julian

> In order to be able to implement NODE_DATA even without having a fully
> functional SINGLE_DB yet, a transitional node numbering scheme needs
> to be devised. The following numbers will apply: BASE == 0,
> WORKING-this-dir == 1, WORKING-any-immediate-child == 2.
>
>
> Other transitioning related remarks:
>
> * Conditional-protected experimentational sections, just like with SINGLE_DB
> * Initial implementation will simply replace the current
> functionality of the 2 tables, from there we can work our way through
> whatever needs doing.
> * Am I forgetting any others?
>
> Bye,
>
> Erik.
Received on 2010-08-03 15:19:17 CEST

This message: [ Message body ]
Next message: Ramkumar Ramachandra: "Re: [PATCH] Fix a historical detail in the commit editor"
Previous message: Julian Foad: "Re: NODE_DATA (2nd iteration)"
Maybe in reply to: Julian Foad: "Re: NODE_DATA (2nd iteration)"
Next in thread: Bert Huijben: "RE: NODE_DATA (2nd iteration)"
Reply: Bert Huijben: "RE: NODE_DATA (2nd iteration)"
Reply: Julian Foad: "Re: NODE_DATA (2nd iteration)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]