[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1442088 - Improve DAG node / path lookup performance in FSFS

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 4 Feb 2013 18:59:08 +0000 (GMT)

Hi Stefan.

> URL: http://svn.apache.org/viewvc?rev=1442088&view=rev
> Log:
> Improve DAG node / path lookup performance in FSFS.  Since this is a hot
> path in many fs_repo-based functions (like fs_verify), 20 .. 30% time
> saved here can easily become significant.
>
> The idea here is that get_dag() already has knowledge about the PATH
> parameter and does not actually use all the result data returned by
> open_path(). So, define a number of flags that allow a caller to tell
> open_path() where it may take short-cuts based on the context.
>
> * subversion/libsvn_fs_fs/tree.c
>   (dag_node_cache_set): add comment to document a rationale
>   (enum open_path_flags_t): declare further flags
>   (open_path): support the new flags to short-circuit portions of code
>   (get_dag,
>    fs_closest_copy,
>    get_mergeinfo_for_path_internal): provide flags to open_path()

> +  /* When this flag is set, don't bother to lookup the DAG node in
> +     our caches because we already tried this.  Ignoring this flag
> +     has no functional impact.  */
> +  open_path_uncached = 2,
> +
> +  /* Assume that the path parameter is already in canonical form. */
> +  open_path_is_canonical = 4,

I don't like this 'is_canonical' flag.  In the rest of Subversion, nearly all APIs assume paths are canonical already, and for a good reason: re-canonicalizing at a lower level tends to be expensive because it happens more times and with less knowledge of whether it's needed.  I think it would be better (including faster) to simply require that the path is canonical, and make sure the callers honour that.

I see that many of the functions in this file currently appear to be accepting and dealing with non-canonical paths, but I think the way to go is to change that.

> +  /* The caller does not care about the parent node chain but only
> +     the final DAG node. */
> +  open_path_node_only = 8
> } open_path_flags_t;
>
>
> @@ -871,6 +885,10 @@ typedef enum open_path_flags_t {
>     callers that create new nodes --- we find the parent directory for
>     them, and tell them whether the entry exists already.
>
> +   The remaining bits in FLAGS are hints that allow this function
> +   to take shortcuts based on knowledge that the caller provides,
> +   such as the fact that PATH is already in canonical form.

Maybe give a different example.

>     NOTE: Public interfaces which only *read* from the filesystem
>     should not call this function directly, but should instead use
>     get_dag().
> @@ -884,23 +902,47 @@ open_path(parent_path_t **parent_path_p,
>            apr_pool_t *pool)
> {
>    svn_fs_t *fs = root->fs;
> -  dag_node_t *here; /* The directory we're currently looking at.  */
> -  parent_path_t *parent_path; /* The path from HERE up to the root.  */
> +  dag_node_t *here = NULL; /* The directory we're currently looking at.  */
> +  parent_path_t *parent_path; /* The path from HERE up to the root. */
>    const char *rest; /* The portion of PATH we haven't traversed yet.  */
> -  const char *canon_path = svn_fs__is_canonical_abspath(path)
> +
> +  /* ensure a canonical path representation */
> +  const char *canon_path = (   (flags & open_path_is_canonical)
> +                            || svn_fs__is_canonical_abspath(path))
>                           ? path
>                           : svn_fs__canonicalize_abspath(path, pool);

Here you seem also to have decided that when canonicalizing, it's more efficient to test with svn_fs__is_canonical_abspath() before calling svn_fs__canonicalize_abspath().  Are you sure that is true?  If it is true here (and I see you already have the same pattern in one of the callers), then perhaps it is true in general, so why not do it inside svn_fs__canonicalize_abspath() so all callers benefit?

> @@ -1142,7 +1195,9 @@ get_dag(dag_node_t **dag_node_p,
>          {
>            /* Call open_path with no flags, as we want this to return an
>             * error if the node for which we are searching doesn't exist. */

The comment was falsified by your change and needs updating.  (There are similar comments on open_path() calls elsewhere in the file, which probably should be updated in the same way even though those calls are still passing no flags.)

> -          SVN_ERR(open_path(&parent_path, root, path, 0, NULL, pool));
> +          SVN_ERR(open_path(&parent_path, root, path,
> +                            open_path_uncached | open_path_is_canonical
> +                            | open_path_node_only, NULL, pool));

(This is the only significant place where the 'is_canonical' flag is used.)

>            node = parent_path->node;
>
>            /* No need to cache our find -- open_path() will do that for us. */
> @@ -3162,7 +3217,7 @@ static svn_error_t *fs_closest_copy(svn_
>    if (kind == svn_node_none)
>      return SVN_NO_ERROR;
>    SVN_ERR(open_path(&copy_dst_parent_path, copy_dst_root, path,
> -                    0, NULL, pool));
> +                    open_path_node_only, NULL, pool));
>    copy_dst_node = copy_dst_parent_path->node;
>    if (! svn_fs_fs__id_check_related(svn_fs_fs__dag_get_id(copy_dst_node),
>                                    
> svn_fs_fs__dag_get_id(parent_path->node)))
> @@ -3775,7 +3830,8 @@ get_mergeinfo_for_path_internal(svn_merg
>
>    path = svn_fs__canonicalize_abspath(path, scratch_pool);
>
> -  SVN_ERR(open_path(&parent_path, rev_root, path, 0, NULL, scratch_pool));
> +  SVN_ERR(open_path(&parent_path, rev_root, path, open_path_is_canonical,
> +                    NULL, scratch_pool));

Here you've left a plain 'canonicalize' in the caller, even though you appear to think that the version inside open_path() is more efficient.  (The result of this 'canonicalize' here is *only* passed to open_path().)

>    if (inherit == svn_mergeinfo_nearest_ancestor && !
> parent_path->parent)
>      return SVN_NO_ERROR;

- Julian
Received on 2013-02-04 19:59:45 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.