On Thu, Sep 2, 2010 at 4:34 PM, Erik Huelsmann <ehuels_at_gmail.com> wrote:
> As described by Julian earlier this month, Julian, Philip and I observed
> that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in
> common. Notably, by introducing the NODE_DATA table, most fields from
> BASE_NODE and WORKING_NODE already moved to a common table.
> The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the
> side of WORKING_NODE are the 2 cache fields 'translated_size' and
> 'last_mod_time'. Apart from those two, there are the indexing fields wc_id,
> local_relpath and parent_relpath.
> In the end we're storing *lots* of bytes (wc_id, local_relpath and
> parent_relpath) to store 2 64-bit values.
> On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path
> and revision. The NODE_DATA table already has the fields original_repos_id,
> original_repos_path and original_revision. When op_depth == 0, these are
> guaranteed to be empty (null), since they are for working nodes with
> copy/move source information. Renaming the three fields in NODE_DATA to
> repos_id, repos_path and revision, generalizing their use to include
> op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE
> would be reduced to a store of the dav_cache, translated_size and
> last_mod_time fields.
> By subsuming translated_size and last_mod_time into NODE_DATA, neither
> WORKING_NODE nor BASE_NODE will need to store these values anymore. This
> eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then
> only stores dav_cache. Here too, it's probably more efficient (in size) to
> store dav_cache in NODE_DATA to prevent repeated storage of wc_id,
> local_relpath and parent_relpath in BASE_NODE.
> In addition to the eliminated storage overhead, we'd be making things a
> little less complex for ourselves: UPDATE, INSERT and DELETE queries would
> be operating only on a single table, removing the need to split updates
> across multiple statements.
> This week, I was discussing this change with Greg on IRC. We both have the
> feeling this should work out well. The proposal here is to switch
> (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES.
> Comments? Fears? Enhancements?
I'm for the change, generally, though I admit I haven't been following
the discussion closely enough to evaluate the technical merits.
My one concern (and perhaps this comes from not following the
discussion closely enough) is how this impacts 1.7. This feels eerily
like an eleventh-hour redesign, and our track record with these in the
past hasn't been too stellar. I realize we've made some significant
progress in the past week with single-db and all that, but this just
feels like a fundamental change which has the potential to postpone
all the other 1.7 features we're eager to get into our users' hands.
The schema can *always* be improved; at what point are we allowing the
perfect to be the enemy of the good?
All that being said, if there is an SMOP in there somewhere, I'm happy
to unleash my OCD on it. :P
Received on 2010-09-03 04:52:01 CEST