FS Node API

From: Stefan Fuhrmann <stefan2_at_apache.org>
Date: Sat, 09 Jan 2016 01:48:12 +0100

Hi Ivan & interested parties,

Excellent of you starting with the API work! For all
who don't know yet: Ivan and I talked about this idea
in Berlin and agreed that it sounds like a good idea.
I had played with some prototypic implementation FSX
and learned a bit about the its pitfalls.

Below there is a list of questions that I had when
reviewing the first few commits. There are also sections
on what I have in mind and partly in code for the new API.

-- Stefan^2.

Questions / Feedback to fs-node-api as of r1723819
--------------------------------------------------

[Not commenting on the implementation since that depends
on points in the other sections.]

* BRANCH-README: "The branch is NOT going to be merged to
trunk; I'm going to re-implement everything in trunk ..."

   - Why implementing it a second time (assuming the new
     API turns out to be useful)?
   - Do you expect the code to be "too prototypic" to be
     used for production? If so, any specific indication
     for that?
   - I think having the feature developed on a branch makes
     sense because the switch-over to the new API will be
     very noisy. Otherwise, it could be done on /trunk
     directly - and be reverted again if necessary.

* svn_fs.h

- Maybe, move the new API functions to new section.
They will not always be 1:1 replacements.

* "svn_fs_node_t does not depends on svn_fs_root object created from"

   - I'm fine with that. Any specific reason behind this
     other than that node_t should be be self-contained
     within fs_t and result_pool?

* "svn_fs_node_t always reference existing node in tree (?)"

   - Within a transaction, nodes of type svn_node_none
     make sense, e.g. after a deletion. Footnote: in-txn
     nodes do have a number of update / invalidation issues,
     which makes having separate vtables instances for
     in-txn and in-rev nodes a good idea.
   - I'm not too sure we should allow creating nodes for
     any random path but I guess we must at least have
     the option to do so b/c of symmetry.

* svn_fs_open_node

   - Could we name this svn_fs_node or svn_fs_get_node?
     "open" suggests that there are resources involved
     which require an implicit and sometimes explicit
     "close".

* svn_fs_file_length

   - I was thinking of naming that svn_fs_node_length.
     It could error out on non-files as it does today
     or return some informative value like the representation
     size. Caller logic needs to handle the "not a file"
     case anyway, so it could use the new info on its own
     discretion.

* svn_fs_dir_entries2

   - Again, this could simply be svn_fs_sub_nodes.
   - The result should be an array of node_t *, sorted
     by their names. This is what the storage layer
     delivers in FS* and it is what the repos layer
     would like to scan during reporting.

Additional Goals
----------------

* Use 2-pool paradigm.

* Where appropriate, make the API more orthogonal and
consistent (e.g. svn_fs_node_make instead of
svn_fs_node_make_file and svn_fs_node_make_dir).

Observations in my Experiments
------------------------------

* Despite calling it "node API" it actually concerned with
   efficiently navigating directories. So, always represent
   the node using 2 structs: the actual node plus a (usually
   shared) struct containing all the context info - representing
   the directory.

* The parameters for svn_fs_node_make, svn_fs_node_copy etc.
should take pure paths to describe the target instead of
parent-node + local-name.

* Some function API function names clash with existing ones,
i.e. need to be revved (svn_fs_node_history3,
svn_fs_node_relation2 and everything prop).

* Txn nodes and revision nodes should use 2 different
   instances of the vtable. Error handling for operations
   that either apply to txn nodes or revision nodes only
   can then be moved to lib_fs by checking for NULL pointers
   in the vtable.

Proposal for an Implementation Strategy
---------------------------------------

* node_compat.*
   - Provide a default / fallback implementation in lib_fs
     for backends that don't have native node API support.
   - Blueprint for backend code, i.e. ensure we can have
     a clean structure
   - Implement API (roughly) one function at a time.

* Ensure semantic equivalence to old API and high test
   coverage from the start
   - Implement node_compat in terms of the old backend API
   - Implement the old FS API in terms of the new API,
     i.e. route all calls through node_compat.*
   - Have #define that controls whether the old API is run
     natively or gets diverted through node_compat.*
   - If the backend vtable entry is NULL for an old API
     function, divert it through node_compat.*

* Switch API users over to the new API
- This is what makes the branch worthwhile.
- Deprecate the old API only after migrating most callers

* Implement the node API in FSFS
   - Take node_compat and only replace the vtable calls
     with direct function calls.
   - Add direct addressing info (noderev IDs) to the structs.
   - Switch functions over to "native" code.
   - NULL the old API vtable entries to enforce the new
     backend logic.

* Final review and merger into /trunk
- Report processing as in "diff --summary" is a good
performance indicator.
Received on 2016-01-09 01:48:10 CET

This message: [ Message body ]
Next message: Ivan Zhakov: "Re: FS Node API"
Previous message: Julian Foad: "Re: detection of moved branches for svn-normalizer tool"
Next in thread: Ivan Zhakov: "Re: FS Node API"
Reply: Ivan Zhakov: "Re: FS Node API"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]