Re: NODE_DATA (2nd iteration)

From: Greg Stein <gstein_at_gmail.com>
Date: Tue, 10 Aug 2010 16:46:03 -0400

I'll work on reviewing this stuff. I believe there are quite a few
details that need to be worked out (like exact presence values).

Cheers,
-g

On Tue, Aug 10, 2010 at 12:18, Julian Foad <julian.foad_at_wandisco.com> wrote:
> Any responses would be greatly appreciated.
>
> - Julian
>
>
> On Tue, 2010-08-03, Julian Foad wrote:
>> On Mon, 2010-07-12, Erik Huelsmann wrote:
>> > After lots of discussion regarding the way NODE_DATA/4th tree should
>> > be working, I'm now ready to post a summary of the progress. In my
>> > last e-mail (http://svn.haxx.se/dev/archive-2010-07/0262.shtml) I
>> > stated why we need this; this post is about the conclusion of what
>> > needs to happen. Also included are the first steps there.
>> >
>> >
>> > With the advent of NODE_DATA, we distinguish node values specifically
>> > related to BASE nodes, those specifically related to "current" WORKING
>> > nodes and those which are to be maintained for multiple levels of
>> > WORKING nodes (not only the "current" view) (the latter category is
>> > most often also shared with BASE).
>> >
>> > The respective tables will hold the columns shown below.
>> >
>> >
>> > -------------------------
>> > TABLE WORKING_NODE (
>> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
>> > local_relpath TEXT NOT NULL,
>> > parent_relpath TEXT,
>> > moved_here INTEGER,
>> > moved_to TEXT,
>> > original_repos_id INTEGER REFERENCES REPOSITORY (id),
>> > original_repos_path TEXT,
>> > original_revnum INTEGER,
>> > translated_size INTEGER,
>> > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */
>> > keep_local INTEGER,
>> >
>> > PRIMARY KEY (wc_id, local_relpath)
>> > );
>> >
>> > CREATE INDEX I_WORKING_PARENT ON WORKING_NODE (wc_id, parent_relpath);
>> > --------------------------------
>> >
>> > The moved_* and original_* columns are typical examples of "WORKING
>> > fields only maintained for the visible WORKING nodes": the original_*
>> > and moved_* fields are inherited from the operation root by all
>> > children part of the operation. The operation root will be the visible
>> > change on its own level, meaning it'll have rows both in the
>> > WORKING_NODE and NODE_DATA tables. The fact that these columns are not
>> > in the WORKING_NODE table means that tree changes are not preserved
>> > accros overlapping changes. This is fully compatible with what we do
>> > today: changes to higher levels destroy changes to lower levels.
>> >
>> > The translated_size and last_mod_time columns exist in WORKING_NODE
>> > and BASE_NODE; they explicitly don't exist in NODE_DATA. The fact that
>> > they exist in BASE_NODE is a bit of a hack: it's to prevent creation
>> > of WORKING_NODE data for every file which has keyword expansion or eol
>> > translation properties set: these columns serve only to optimize
>> > working copy scanning for changes and as such only relate to the
>> > visible WORKING_NODEs.
>> >
>>
>> Can we come up with an English description of what each table will now
>> represent?
>>
>> "The BASE_NODE table lists the existing node-revs in the repository that
>> comprise the mixed-revision tree that was most recently updated/switched
>> to or checked out. (The kind and content of these nodes is not here;
>> see the NODE_DATA table.)"
>>
>> > TABLE BASE_NODE (
>> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
>> > local_relpath TEXT NOT NULL,
>> > repos_id INTEGER REFERENCES REPOSITORY (id),
>> > repos_relpath TEXT,
>>
>> We need a revision number column here to go along with repos_id and
>> relpath to make a valid node-rev reference, don't we?
>>
>> > parent_relpath TEXT,
>>
>> (While we're reorganising, can we move that "parent_relpath" column to
>> adjacent to "local_relpath"?)
>>
>> > translated_size INTEGER,
>> > last_mod_time INTEGER, /* an APR date/time (usec since 1970) */
>> > dav_cache BLOB,
>> > incomplete_children INTEGER,
>> > file_external TEXT,
>> >
>> > PRIMARY KEY (wc_id, local_relpath)
>> > );
>> >
>>
>> "The NODE_DATA table records the kind and shallow content (props, text,
>> link target) of each node in the WC. It includes both the nodes that
>> comprise the currently 'visible' (or 'actual' or 'on-disk') state of the
>> WC and also all nodes that are part of a copied or moved tree but
>> currently shadowed by a replacement performed inside that tree.
>>
>> At least one row exists for each WC path, including paths with no change
>> and all paths affected by a tree change (add, delete, etc.). If the
>> same path is affected by multiple levels of tree change - a replacement
>> inside a copied directory, for example - then multiple rows exist with
>> different 'op_depth' values."
>>
>> > TABLE NODE_DATA (
>> > wc_id INTEGER NOT NULL REFERENCES WCROOT (id),
>> > local_relpath TEXT NOT NULL,
>> > op_depth INTEGER NOT NULL,
>> > presence TEXT NOT NULL,
>> > kind TEXT NOT NULL,
>> > checksum TEXT,
>> > changed_rev INTEGER,
>> > changed_date INTEGER, /* an APR date/time (usec since 1970) */
>> > changed_author TEXT,
>>
>> The changed_* columns can only belong to a node-rev that exists in the
>> repository. What node-rev do they belong to and why aren't they
>> alongside the node-rev details?
>>
>> (The changed_* columns convey essentially a rev number and two of the
>> rev-props associated with that revnum that can be used in keyword
>> expansions. We should consider representing that information in a more
>> general form, both to avoid tying the DB format to the choice of those
>> two particular revprops, and to avoid the redundancy of storing these
>> same data and author values N times.)
>>
>>
>> > depth TEXT,
>> > symlink_target TEXT,
>> > properties BLOB,
>>
>> (While we're rearranging, can we group the node-content fields together:
>> kind, properties, checksum, symlink_target?)
>>
>> > PRIMARY KEY (wc_id, local_relpath, oproot)
>>
>> s/oproot/op_depth/?
>>
>> > );
>> >
>> > CREATE INDEX I_NODE_WC_RELPATH ON NODE_DATA (wc_id, local_relpath);
>> >
>> >
>> > Which leaves the NODE_DATA structure above. The op_depth column
>> > contains the depth of the node - relative to the wc root - on which
>> > the operation was run which caused the creation of the given NODE_DATA
>> > node. In the final scheme (based on single-db), the value will be 0
>> > for base and a positive integer for WORKING related data.
>>
>> Let's assume single-db. By the last sentence, I understand: For each
>> BASE_NODE row there is a corresponding NODE_DATA row with 'op_root' = 0;
>> for every node brought in by a tree operation (copy, move, add) to an
>> immediate child of the WC root there is a NODE_DATA row with 'op_root' =
>> 1; for every child of a child ... 2; and so on.
>>
>>
>> - Julian
>>
>>
>> > In order to be able to implement NODE_DATA even without having a fully
>> > functional SINGLE_DB yet, a transitional node numbering scheme needs
>> > to be devised. The following numbers will apply: BASE == 0,
>> > WORKING-this-dir == 1, WORKING-any-immediate-child == 2.
>> >
>> >
>> > Other transitioning related remarks:
>> >
>> > * Conditional-protected experimentational sections, just like with SINGLE_DB
>> > * Initial implementation will simply replace the current
>> > functionality of the 2 tables, from there we can work our way through
>> > whatever needs doing.
>> > * Am I forgetting any others?
>> >
>> > Bye,
>> >
>> > Erik.
>>
>>
>
>
>
Received on 2010-08-10 22:46:42 CEST

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: Bikeshed: configuration override order"
Previous message: Greg Stein: "Re: RFC: How should revert handle copied/added items?"
In reply to: Julian Foad: "Re: NODE_DATA (2nd iteration)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]