Hi Stefan.
> URL: http://svn.apache.org/viewvc?rev=1442088&view=rev
> Log:
> Improve DAG node / path lookup performance in FSFS. Since this is a hot
> path in many fs_repo-based functions (like fs_verify), 20 .. 30% time
> saved here can easily become significant.
>
> The idea here is that get_dag() already has knowledge about the PATH
> parameter and does not actually use all the result data returned by
> open_path(). So, define a number of flags that allow a caller to tell
> open_path() where it may take short-cuts based on the context.
>
> * subversion/libsvn_fs_fs/tree.c
> (dag_node_cache_set): add comment to document a rationale
> (enum open_path_flags_t): declare further flags
> (open_path): support the new flags to short-circuit portions of code
> (get_dag,
> fs_closest_copy,
> get_mergeinfo_for_path_internal): provide flags to open_path()
> + /* When this flag is set, don't bother to lookup the DAG node in
> + our caches because we already tried this. Ignoring this flag
> + has no functional impact. */
> + open_path_uncached = 2,
> +
> + /* Assume that the path parameter is already in canonical form. */
> + open_path_is_canonical = 4,
I don't like this 'is_canonical' flag. In the rest of Subversion, nearly all APIs assume paths are canonical already, and for a good reason: re-canonicalizing at a lower level tends to be expensive because it happens more times and with less knowledge of whether it's needed. I think it would be better (including faster) to simply require that the path is canonical, and make sure the callers honour that.
I see that many of the functions in this file currently appear to be accepting and dealing with non-canonical paths, but I think the way to go is to change that.
> + /* The caller does not care about the parent node chain but only
> + the final DAG node. */
> + open_path_node_only = 8
> } open_path_flags_t;
>
>
> @@ -871,6 +885,10 @@ typedef enum open_path_flags_t {
> callers that create new nodes --- we find the parent directory for
> them, and tell them whether the entry exists already.
>
> + The remaining bits in FLAGS are hints that allow this function
> + to take shortcuts based on knowledge that the caller provides,
> + such as the fact that PATH is already in canonical form.
Maybe give a different example.
> NOTE: Public interfaces which only *read* from the filesystem
> should not call this function directly, but should instead use
> get_dag().
> @@ -884,23 +902,47 @@ open_path(parent_path_t **parent_path_p,
> apr_pool_t *pool)
> {
> svn_fs_t *fs = root->fs;
> - dag_node_t *here; /* The directory we're currently looking at. */
> - parent_path_t *parent_path; /* The path from HERE up to the root. */
> + dag_node_t *here = NULL; /* The directory we're currently looking at. */
> + parent_path_t *parent_path; /* The path from HERE up to the root. */
> const char *rest; /* The portion of PATH we haven't traversed yet. */
> - const char *canon_path = svn_fs__is_canonical_abspath(path)
> +
> + /* ensure a canonical path representation */
> + const char *canon_path = ( (flags & open_path_is_canonical)
> + || svn_fs__is_canonical_abspath(path))
> ? path
> : svn_fs__canonicalize_abspath(path, pool);
Here you seem also to have decided that when canonicalizing, it's more efficient to test with svn_fs__is_canonical_abspath() before calling svn_fs__canonicalize_abspath(). Are you sure that is true? If it is true here (and I see you already have the same pattern in one of the callers), then perhaps it is true in general, so why not do it inside svn_fs__canonicalize_abspath() so all callers benefit?
> @@ -1142,7 +1195,9 @@ get_dag(dag_node_t **dag_node_p,
> {
> /* Call open_path with no flags, as we want this to return an
> * error if the node for which we are searching doesn't exist. */
The comment was falsified by your change and needs updating. (There are similar comments on open_path() calls elsewhere in the file, which probably should be updated in the same way even though those calls are still passing no flags.)
> - SVN_ERR(open_path(&parent_path, root, path, 0, NULL, pool));
> + SVN_ERR(open_path(&parent_path, root, path,
> + open_path_uncached | open_path_is_canonical
> + | open_path_node_only, NULL, pool));
(This is the only significant place where the 'is_canonical' flag is used.)
> node = parent_path->node;
>
> /* No need to cache our find -- open_path() will do that for us. */
> @@ -3162,7 +3217,7 @@ static svn_error_t *fs_closest_copy(svn_
> if (kind == svn_node_none)
> return SVN_NO_ERROR;
> SVN_ERR(open_path(©_dst_parent_path, copy_dst_root, path,
> - 0, NULL, pool));
> + open_path_node_only, NULL, pool));
> copy_dst_node = copy_dst_parent_path->node;
> if (! svn_fs_fs__id_check_related(svn_fs_fs__dag_get_id(copy_dst_node),
>
> svn_fs_fs__dag_get_id(parent_path->node)))
> @@ -3775,7 +3830,8 @@ get_mergeinfo_for_path_internal(svn_merg
>
> path = svn_fs__canonicalize_abspath(path, scratch_pool);
>
> - SVN_ERR(open_path(&parent_path, rev_root, path, 0, NULL, scratch_pool));
> + SVN_ERR(open_path(&parent_path, rev_root, path, open_path_is_canonical,
> + NULL, scratch_pool));
Here you've left a plain 'canonicalize' in the caller, even though you appear to think that the version inside open_path() is more efficient. (The result of this 'canonicalize' here is *only* passed to open_path().)
> if (inherit == svn_mergeinfo_nearest_ancestor && !
> parent_path->parent)
> return SVN_NO_ERROR;
- Julian
Received on 2013-02-04 19:59:45 CET