Re: [RFC] Inheritable Properties Branch -- caching props in WC DB
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Thu, 13 Sep 2012 21:18:01 +0100 (BST)
Paul Burba wrote:
This branch caches properties of certain repository nodes, in a new place in the WC DB.
I haven't examined it at all yet. The Wiki page says: "The cache will be stored in a new BLOB column in the NODES table called 'inherited_props'. If the node is the base node of a WC root then this column is populated with a serialized skel of the node's inherited properties. In all other cases it is NULL."
Another new special column in the NODES table... the twenty-third column. Do we have to? Is it not complex enough already?
I want to make a radical suggestion.
What would it take to be able to share some data instead of creating a special-purpose column just for this cache? Well, it would take more work initially, but hear me out, as I have a hunch it may make less work in the end.
I think we would do well to factor out the storage of all data about a repository node-rev, moving that into a separate table that is *referenced* from the NODES table. That new table would be indexed by node-rev (that is: repos_id, rev,
What we need to know here (for inherited props) is the properties set on each ancestor of ever WC "root node" (which is defined elsewhere and includes switched nodes etc.) That's not a complex requirement. Instead of creating a new column dedicated to storing the union of all the props inherited from every ancestor node, how about we fetch and concatenate, on request, the node-rev table rows for all the pathwise ancestors of the desired node. The cache maintenance then becomes ensuring that a set of node-rev table rows are populated (populated in the standard way for such rows), and that those rows are removed when no longer referenced.
But this particular use is not the only reason for factoring out this data into a separate table. There is:
- For inherited props, this use.
- For inherited props, if the current model of BASE node inheritance remains, we're also going to need to cache the inherited props for every mixed-rev node (that is, every node whose rev != its parent-in-BASE rev). (I'll write separately about why we need this for off-line operation.)
- We already have (I think) the case where a local WC-to-WC copy or move of a tree results in poulating another set of tree nodes in the NODES table with an identical copy of all the data. Whenever we can reduce the amount of such copying, if the additional complexity is not too great, that can make it easier to achieve both correctness and efficiency.
- For conflicts, it's useful to store the "theirs" and "old base"
- I think it would make the WC *much* more maintainable. At the moment, there doesn't appear to be any codified connection between those eight columns in NODES. There doesn't even seem to be any comment in the doc strings to indicate that. It's almost impossible to analyze the huge query statements to check that they're all updated together.
Therefore I propose a WC development that adds a new table (call it NODEREV?), indexed by repos/rev/relpath, with eight further columns that will be moved out of the current NODES table:
repos_id, revision, repos_path -- the index columns
Then we add an indirection from the NODES table via the index columns whenever we access those columns. And some means of controlling creation and deletion of those rows, perhaps using a ref-count.
I acknowledge that such a development is far from trivial, and I don't expect Paul to do it as part of this work. I just want to get the proposal out there and see if there is any consensus for working toward something like it.
In terms of the inheritable props branch, I wonder if there's any way we can move a bit closer to this proposal without going the whole hog, but I can't think of anything.
This is an archived mail posted to the Subversion Dev mailing list.