[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: r1442088 - Improve DAG node / path lookup performance in FSFS

From: Bert Huijben <bert_at_qqmail.nl>
Date: Fri, 22 Feb 2013 16:19:02 +0100

Replying on some part:
(Can we please use plain text. I’m often guilty myself too, but this thread
is getting too far and I can’t tell the difference between who wrote what)

[Julian]
I don't like this 'is_canonical' flag.  In the rest of Subversion, nearly
all APIs assume paths are canonical already, and for a good reason:
re-canonicalizing at a lower level tends to be expensive because it happens
more times and with less knowledge of whether it's needed.  I think it would
be better (including faster) to simply require that the path is canonical,
and make sure the callers honour that.

I see that many of the functions in this file currently appear to be
accepting and dealing with non-canonical paths, but I think the way to go is
to change that.
<<

[Stefan]
Turned out not to be too much work to fix all callers.
<<

I still see that we canonicalize the paths on +- every api. (About 20 calls
in tree.c and dag.c)

Eventually this api should move to something like an
assert(is_canonical(path,…)) on all functions and just assume that callers
do the right thing, like we do everywhere else

Canonicalizing is everywhere is expensive and error prone. API entry points
that must assume non canonical paths should be deprecated, with the
deprecated function handling the canonicalization. (Like
svn_path_url_add_component which had some assumptions built in).

All our apis assume paths are canonical and I currently see code (get_dag())
that tries something and if it fails canonicalizes the path and tries again.

We should really get away from this kind of patterns wherever we can.
(Security boundaries might be an exception)

        Bert

From: Stefan Fuhrmann [mailto:stefan.fuhrmann_at_wandisco.com]
Sent: donderdag 21 februari 2013 22:56
To: Julian Foad
Cc: dev_at_subversion.apache.org
Subject: Re: r1442088 - Improve DAG node / path lookup performance in FSFS

On Mon, Feb 4, 2013 at 7:59 PM, Julian Foad <julianfoad_at_btopenworld.com>
wrote:
Hi Stefan.

> URL: http://svn.apache.org/viewvc?rev=1442088&view=rev
> Log:
> Improve DAG node / path lookup performance in FSFS.  Since this is a hot
> path in many fs_repo-based functions (like fs_verify), 20 .. 30% time
> saved here can easily become significant.
>
> The idea here is that get_dag() already has knowledge about the PATH
> parameter and does not actually use all the result data returned by
> open_path(). So, define a number of flags that allow a caller to tell
> open_path() where it may take short-cuts based on the context.
>
> * subversion/libsvn_fs_fs/tree.c
>   (dag_node_cache_set): add comment to document a rationale
>   (enum open_path_flags_t): declare further flags
>   (open_path): support the new flags to short-circuit portions of code
>   (get_dag,
>    fs_closest_copy,
>    get_mergeinfo_for_path_internal): provide flags to open_path()

> +  /* When this flag is set, don't bother to lookup the DAG node in
> +     our caches because we already tried this.  Ignoring this flag
> +     has no functional impact.  */
> +  open_path_uncached = 2,
> +
> +  /* Assume that the path parameter is already in canonical form. */
> +  open_path_is_canonical = 4,

I don't like this 'is_canonical' flag.  In the rest of Subversion, nearly
all APIs assume paths are canonical already, and for a good reason:
re-canonicalizing at a lower level tends to be expensive because it happens
more times and with less knowledge of whether it's needed.  I think it would
be better (including faster) to simply require that the path is canonical,
and make sure the callers honour that.

I see that many of the functions in this file currently appear to be
accepting and dealing with non-canonical paths, but I think the way to go is
to change that.

Turned out not to be too much work to fix all callers.
> +  /* The caller does not care about the parent node chain but only
> +     the final DAG node. */
> +  open_path_node_only = 8
> } open_path_flags_t;
>
>
> @@ -871,6 +885,10 @@ typedef enum open_path_flags_t {
>     callers that create new nodes --- we find the parent directory for
>     them, and tell them whether the entry exists already.
>
> +   The remaining bits in FLAGS are hints that allow this function
> +   to take shortcuts based on knowledge that the caller provides,
> +   such as the fact that PATH is already in canonical form.

Maybe give a different example.

Done.
 
>     NOTE: Public interfaces which only *read* from the filesystem
>     should not call this function directly, but should instead use
>     get_dag().
> @@ -884,23 +902,47 @@ open_path(parent_path_t **parent_path_p,
>            apr_pool_t *pool)
> {
>    svn_fs_t *fs = root->fs;
> -  dag_node_t *here; /* The directory we're currently looking at.  */
> -  parent_path_t *parent_path; /* The path from HERE up to the root.  */
> +  dag_node_t *here = NULL; /* The directory we're currently looking at. 
*/
> +  parent_path_t *parent_path; /* The path from HERE up to the root. */
>    const char *rest; /* The portion of PATH we haven't traversed yet.  */
> -  const char *canon_path = svn_fs__is_canonical_abspath(path)
> +
> +  /* ensure a canonical path representation */
> +  const char *canon_path = (   (flags & open_path_is_canonical)
> +                            || svn_fs__is_canonical_abspath(path))
>                           ? path
>                           : svn_fs__canonicalize_abspath(path, pool);

Here you seem also to have decided that when canonicalizing, it's more
efficient to test with svn_fs__is_canonical_abspath() before calling
svn_fs__canonicalize_abspath().  Are you sure that is true?  If it is true
here (and I see you already have the same pattern in one of the callers),
then perhaps it is true in general, so why not do it inside
svn_fs__canonicalize_abspath() so all callers benefit?

It's now all inside svn_fs__canonicalize_abspath, which has now
subtly changed semantics. But all existing callers should be fine
with that.
> @@ -1142,7 +1195,9 @@ get_dag(dag_node_t **dag_node_p,
>          {
>            /* Call open_path with no flags, as we want this to return an
>             * error if the node for which we are searching doesn't exist.
*/

The comment was falsified by your change and needs updating.  (There are
similar comments on open_path() calls elsewhere in the file, which probably
should be updated in the same way even though those calls are still passing
no flags.)

> -          SVN_ERR(open_path(&parent_path, root, path, 0, NULL, pool));
> +          SVN_ERR(open_path(&parent_path, root, path,
> +                            open_path_uncached | open_path_is_canonical
> +                            | open_path_node_only, NULL, pool));

(This is the only significant place where the 'is_canonical' flag is used.)

>            node = parent_path->node;
>
>            /* No need to cache our find -- open_path() will do that for
us. */
> @@ -3162,7 +3217,7 @@ static svn_error_t *fs_closest_copy(svn_
>    if (kind == svn_node_none)
>      return SVN_NO_ERROR;
>    SVN_ERR(open_path(&copy_dst_parent_path, copy_dst_root, path,
> -                    0, NULL, pool));
> +                    open_path_node_only, NULL, pool));
>    copy_dst_node = copy_dst_parent_path->node;
>    if (! svn_fs_fs__id_check_related(svn_fs_fs__dag_get_id(copy_dst_node),
>                                    
> svn_fs_fs__dag_get_id(parent_path->node)))
> @@ -3775,7 +3830,8 @@ get_mergeinfo_for_path_internal(svn_merg
>
>    path = svn_fs__canonicalize_abspath(path, scratch_pool);
>
> -  SVN_ERR(open_path(&parent_path, rev_root, path, 0, NULL,
scratch_pool));
> +  SVN_ERR(open_path(&parent_path, rev_root, path, open_path_is_canonical,
> +                    NULL, scratch_pool));

Here you've left a plain 'canonicalize' in the caller, even though you
appear to think that the version inside open_path() is more efficient.  (The
result of this 'canonicalize' here is *only* passed to open_path().)

>    if (inherit == svn_mergeinfo_nearest_ancestor && !
> parent_path->parent)
>      return SVN_NO_ERROR;

- Julian

Changes committed in r1448810 with a follow-up in r1448820.

Thanks for the review!

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download
Received on 2013-02-22 16:19:42 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.