Given all the responses in the thread, I'd say we're moving to the single
table for BASE and WORKING node recording. There was a flurry of activity
from me yesterday and this morning regarding NODE_DATA: that was just me
flushing my queue of patches.
The work isn't completely irrelevant, as it identifies the spots where the
NODES table will be introduced, just like NODE_DATA had to.
Today, I'll go to draw up the NODES table and move over the queries which
had already been modified for NODE_DATA over to the NODES design. I hope to
get a very long way today already. If it's not done today, then I expect it
to be able to finish it this week.
Anyone wanting to join in: let's chat on IRC.
Bye,
Erik.
On Thu, Sep 2, 2010 at 11:34 PM, Erik Huelsmann <ehuels_at_gmail.com> wrote:
>
>
> As described by Julian earlier this month, Julian, Philip and I observed
> that the BASE_NODE, WORKING_NODE and NODE_DATA tables have many fields in
> common. Notably, by introducing the NODE_DATA table, most fields from
> BASE_NODE and WORKING_NODE already moved to a common table.
>
> The remaining fields (after switching to NODE_DATA *and* SINGLE-DB) on the
> side of WORKING_NODE are the 2 cache fields 'translated_size' and
> 'last_mod_time'. Apart from those two, there are the indexing fields wc_id,
> local_relpath and parent_relpath.
>
> In the end we're storing *lots* of bytes (wc_id, local_relpath and
> parent_relpath) to store 2 64-bit values.
>
> On the side of BASE_NODE, we end up storing dav_cache, repos_id, repos_path
> and revision. The NODE_DATA table already has the fields original_repos_id,
> original_repos_path and original_revision. When op_depth == 0, these are
> guaranteed to be empty (null), since they are for working nodes with
> copy/move source information. Renaming the three fields in NODE_DATA to
> repos_id, repos_path and revision, generalizing their use to include
> op_depth == 0 [ofcourse nicely documented in the table docs], BASE_NODE
> would be reduced to a store of the dav_cache, translated_size and
> last_mod_time fields.
>
> By subsuming translated_size and last_mod_time into NODE_DATA, neither
> WORKING_NODE nor BASE_NODE will need to store these values anymore. This
> eliminates the entire reason of existence of WORKING_NODE. BASE_NODE then
> only stores dav_cache. Here too, it's probably more efficient (in size) to
> store dav_cache in NODE_DATA to prevent repeated storage of wc_id,
> local_relpath and parent_relpath in BASE_NODE.
>
> In addition to the eliminated storage overhead, we'd be making things a
> little less complex for ourselves: UPDATE, INSERT and DELETE queries would
> be operating only on a single table, removing the need to split updates
> across multiple statements.
>
>
> This week, I was discussing this change with Greg on IRC. We both have the
> feeling this should work out well. The proposal here is to switch
> (WORKING_NODE, NODE_DATA, BASE_NODE) into a single table --> NODES.
>
>
> Comments? Fears? Enhancements?
>
>
> Bye,
>
>
> Erik.
>
Received on 2010-09-06 10:31:51 CEST