As described by Julian earlier this month, Julian, Philip and I observed
that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in
common. Notably, by introducing the NODE_DATA table, most fields from
BASE_NODE and WORKING_NODE already moved to a common table.
The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the
side of WORKING_NODE are the 2 cache fields 'translated_size' and
'last_mod_time'. Apart from those two, there are the indexing fields wc_id,
local_relpath and parent_relpath.
In the end we're storing *lots* of bytes (wc_id, local_relpath and
parent_relpath) to store 2 64-bit values.
On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path
and revision. The NODE_DATA table already has the fields original_repos_id,
original_repos_path and original_revision. When op_depth == 0, these are
guaranteed to be empty (null), since they are for working nodes with
copy/move source information. Renaming the three fields in NODE_DATA to
repos_id, repos_path and revision, generalizing their use to include
op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE
would be reduced to a store of the dav_cache, translated_size and
last_mod_time fields.
By subsuming translated_size and last_mod_time into NODE_DATA, neither
WORKING_NODE nor BASE_NODE will need to store these values anymore. This
eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then
only stores dav_cache. Here too, it's probably more efficient (in size) to
store dav_cache in NODE_DATA to prevent repeated storage of wc_id,
local_relpath and parent_relpath in BASE_NODE.
In addition to the eliminated storage overhead, we'd be making things a
little less complex for ourselves: UPDATE, INSERT and DELETE queries would
be operating only on a single table, removing the need to split updates
across multiple statements.
This week, I was discussing this change with Greg on IRC. We both have the
feeling this should work out well. The proposal here is to switch
(WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES.
Comments? Fears? Enhancements?
Bye,
Erik.
Received on 2010-09-02 23:35:40 CEST