Re: FS Node API

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Sat, 9 Jan 2016 16:39:22 +0300

On 9 January 2016 at 03:48, Stefan Fuhrmann <stefan2_at_apache.org> wrote:
>
> Hi Ivan & interested parties,
>
Hi Stefan!

See my response inline.

> Excellent of you starting with the API work! For all
> who don't know yet: Ivan and I talked about this idea
> in Berlin and agreed that it sounds like a good idea.
> I had played with some prototypic implementation FSX
> and learned a bit about the its pitfalls.
>
> Below there is a list of questions that I had when
> reviewing the first few commits. There are also sections
> on what I have in mind and partly in code for the new API.
>
> -- Stefan^2.
>
>
> Questions / Feedback to fs-node-api as of r1723819
> --------------------------------------------------
>
> [Not commenting on the implementation since that depends
> on points in the other sections.]
>
> * BRANCH-README: "The branch is NOT going to be merged to
> trunk; I'm going to re-implement everything in trunk ..."
>
> - Why implementing it a second time (assuming the new
> API turns out to be useful)?
> - Do you expect the code to be "too prototypic" to be
> used for production? If so, any specific indication
> for that?
> - I think having the feature developed on a branch makes
> sense because the switch-over to the new API will be
> very noisy. Otherwise, it could be done on /trunk
> directly - and be reverted again if necessary.
>
I don't like developing new features in branches, so I consider this
branch as experiment to explore the FS node API idea and make
important decisions. It's not a big problem to re-implement all this
on trunk using multiple incremental changes with detailed log messages
and proper docstrings.

> * svn_fs.h
>
> - Maybe, move the new API functions to new section.
> They will not always be 1:1 replacements.
>
Sure, but for simplicity I just revving existing functions to accept
svn_fs_node_t instead ROOT+PATH.

> * "svn_fs_node_t does not depends on svn_fs_root object created from"
>
> - I'm fine with that. Any specific reason behind this
> other than that node_t should be be self-contained
> within fs_t and result_pool?
>
libsvn_repos/reporter.c open and closes svn_fs_root_t dynamically and
it will be very challenging to rewrite to maintain dependency between
svn_fs_root_t and svn_fs_node_t.

> * "svn_fs_node_t always reference existing node in tree (?)"
>
> - Within a transaction, nodes of type svn_node_none
> make sense, e.g. after a deletion. Footnote: in-txn
> nodes do have a number of update / invalidation issues,
> which makes having separate vtables instances for
> in-txn and in-rev nodes a good idea.
> - I'm not too sure we should allow creating nodes for
> any random path but I guess we must at least have
> the option to do so b/c of symmetry.
>

There are two possible design approaches:
1. svn_fs_node_t is just combination of ROOT+PATH and may point to
some never existent node.
2. svn_fs_node_t is always point to some existing node in tree. Yes,
we may end up with dangling node in case of TXN root nodes, but it's
out of scope.

I decided to go with (2) at this time.

> * svn_fs_open_node
>
> - Could we name this svn_fs_node or svn_fs_get_node?
> "open" suggests that there are resources involved
> which require an implicit and sometimes explicit
> "close".
>
This is intentional name: svn_fs_open_node() actually do some work to
check whether node exists.

> * svn_fs_file_length
>
> - I was thinking of naming that svn_fs_node_length.
> It could error out on non-files as it does today
> or return some informative value like the representation
> size. Caller logic needs to handle the "not a file"
> case anyway, so it could use the new info on its own
> discretion.
>
Makes sense, but for the branch I wanted to keep API like similar as
possible for easy switching callers.

> * svn_fs_dir_entries2
>
> - Again, this could simply be svn_fs_sub_nodes.
I kept existing semantic just to making porting to new API easier. We
can discuss this improvement later.

> - The result should be an array of node_t *, sorted
> by their names. This is what the storage layer
> delivers in FS* and it is what the repos layer
> would like to scan during reporting.
I think it's completely ortogonal to FS node API, so decided to move
this task out of scope of the branch. Btw current update reporter
implementation uses hashes very intensively.

> Additional Goals
> ----------------
>
> * Use 2-pool paradigm.
>
Yes, I already using 2-pool paradigm in all revv functions.

> * Where appropriate, make the API more orthogonal and
> consistent (e.g. svn_fs_node_make instead of
> svn_fs_node_make_file and svn_fs_node_make_dir).
>
I think it's of scope at least for experimental branch. Personally I'm
fine with current naming and would like to avoid just flavoring
changes.

>
> Observations in my Experiments
> ------------------------------
>
> * Despite calling it "node API" it actually concerned with
> efficiently navigating directories. So, always represent
> the node using 2 structs: the actual node plus a (usually
> shared) struct containing all the context info - representing
> the directory.
>
> * The parameters for svn_fs_node_make, svn_fs_node_copy etc.
> should take pure paths to describe the target instead of
> parent-node + local-name.
>
> * Some function API function names clash with existing ones,
> i.e. need to be revved (svn_fs_node_history3,
> svn_fs_node_relation2 and everything prop).
>
> * Txn nodes and revision nodes should use 2 different
> instances of the vtable. Error handling for operations
> that either apply to txn nodes or revision nodes only
> can then be moved to lib_fs by checking for NULL pointers
> in the vtable.
>
What is purpose of different vtables for txn and revision nodes?

>
> Proposal for an Implementation Strategy
> ---------------------------------------
>
> * node_compat.*
> - Provide a default / fallback implementation in lib_fs
> for backends that don't have native node API support.
> - Blueprint for backend code, i.e. ensure we can have
> a clean structure
> - Implement API (roughly) one function at a time.
>
> * Ensure semantic equivalence to old API and high test
> coverage from the start
> - Implement node_compat in terms of the old backend API
> - Implement the old FS API in terms of the new API,
> i.e. route all calls through node_compat.*
> - Have #define that controls whether the old API is run
> natively or gets diverted through node_compat.*
> - If the backend vtable entry is NULL for an old API
> function, divert it through node_compat.*
I've already added '#if 0' code for that. See svn_fs_open_node().
>
> * Switch API users over to the new API
> - This is what makes the branch worthwhile.
> - Deprecate the old API only after migrating most callers
>
> * Implement the node API in FSFS
> - Take node_compat and only replace the vtable calls
> with direct function calls.
> - Add direct addressing info (noderev IDs) to the structs.
I don't understand this point. Why we cannot use pointer to dag_node_t
for svn_fs_node_t implemention in FSFS?

> - Switch functions over to "native" code.
> - NULL the old API vtable entries to enforce the new
> backend logic.
>
> * Final review and merger into /trunk
> - Report processing as in "diff --summary" is a good
> performance indicator.
My first goal is to switch 'svn ls -v' to new API and see how it works.

-- 
Ivan Zhakov

Received on 2016-01-09 14:39:48 CET

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: [RFC/PATCH] Handling PROPFIND in mod_dav_svn (was: Re: Issue with browsing a SVN 1.9.2, schema 7, packed, repository)"
Previous message: Stefan Fuhrmann: "FS Node API"
In reply to: Stefan Fuhrmann: "FS Node API"
Next in thread: Stefan Fuhrmann: "Re: FS Node API"
Reply: Stefan Fuhrmann: "Re: FS Node API"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]