[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: r1442088 - Improve DAG node / path lookup performance in FSFS

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 21 Feb 2013 22:56:14 +0100

On Mon, Feb 4, 2013 at 7:59 PM, Julian Foad <julianfoad_at_btopenworld.com>wrote:

> Hi Stefan.
>
> > URL: http://svn.apache.org/viewvc?rev=1442088&view=rev
> > Log:
> > Improve DAG node / path lookup performance in FSFS. Since this is a hot
> > path in many fs_repo-based functions (like fs_verify), 20 .. 30% time
> > saved here can easily become significant.
> >
> > The idea here is that get_dag() already has knowledge about the PATH
> > parameter and does not actually use all the result data returned by
> > open_path(). So, define a number of flags that allow a caller to tell
> > open_path() where it may take short-cuts based on the context.
> >
> > * subversion/libsvn_fs_fs/tree.c
> > (dag_node_cache_set): add comment to document a rationale
> > (enum open_path_flags_t): declare further flags
> > (open_path): support the new flags to short-circuit portions of code
> > (get_dag,
> > fs_closest_copy,
> > get_mergeinfo_for_path_internal): provide flags to open_path()
>
>
> > + /* When this flag is set, don't bother to lookup the DAG node in
> > + our caches because we already tried this. Ignoring this flag
> > + has no functional impact. */
> > + open_path_uncached = 2,
> > +
> > + /* Assume that the path parameter is already in canonical form. */
> > + open_path_is_canonical = 4,
>
> I don't like this 'is_canonical' flag. In the rest of Subversion, nearly
> all APIs assume paths are canonical already, and for a good reason:
> re-canonicalizing at a lower level tends to be expensive because it happens
> more times and with less knowledge of whether it's needed. I think it
> would be better (including faster) to simply require that the path is
> canonical, and make sure the callers honour that.
>
> I see that many of the functions in this file currently appear to be
> accepting and dealing with non-canonical paths, but I think the way to go
> is to change that.
>

Turned out not to be too much work to fix all callers.

> + /* The caller does not care about the parent node chain but only
> > + the final DAG node. */
> > + open_path_node_only = 8
> > } open_path_flags_t;
> >
> >
> > @@ -871,6 +885,10 @@ typedef enum open_path_flags_t {
> > callers that create new nodes --- we find the parent directory for
> > them, and tell them whether the entry exists already.
> >
> > + The remaining bits in FLAGS are hints that allow this function
> > + to take shortcuts based on knowledge that the caller provides,
> > + such as the fact that PATH is already in canonical form.
>
> Maybe give a different example.
>

Done.

> > NOTE: Public interfaces which only *read* from the filesystem
> > should not call this function directly, but should instead use
> > get_dag().
> > @@ -884,23 +902,47 @@ open_path(parent_path_t **parent_path_p,
> > apr_pool_t *pool)
> > {
> > svn_fs_t *fs = root->fs;
> > - dag_node_t *here; /* The directory we're currently looking at. */
> > - parent_path_t *parent_path; /* The path from HERE up to the root. */
> > + dag_node_t *here = NULL; /* The directory we're currently looking
> at. */
> > + parent_path_t *parent_path; /* The path from HERE up to the root. */
> > const char *rest; /* The portion of PATH we haven't traversed yet. */
> > - const char *canon_path = svn_fs__is_canonical_abspath(path)
> > +
> > + /* ensure a canonical path representation */
> > + const char *canon_path = ( (flags & open_path_is_canonical)
> > + || svn_fs__is_canonical_abspath(path))
> > ? path
> > : svn_fs__canonicalize_abspath(path, pool);
>
> Here you seem also to have decided that when canonicalizing, it's more
> efficient to test with svn_fs__is_canonical_abspath() before calling
> svn_fs__canonicalize_abspath(). Are you sure that is true? If it is true
> here (and I see you already have the same pattern in one of the callers),
> then perhaps it is true in general, so why not do it inside
> svn_fs__canonicalize_abspath() so all callers benefit?
>

It's now all inside svn_fs__canonicalize_abspath, which has now
subtly changed semantics. But all existing callers should be fine
with that.

> @@ -1142,7 +1195,9 @@ get_dag(dag_node_t **dag_node_p,
> > {
> > /* Call open_path with no flags, as we want this to return an
> > * error if the node for which we are searching doesn't
> exist. */
>
> The comment was falsified by your change and needs updating. (There are
> similar comments on open_path() calls elsewhere in the file, which probably
> should be updated in the same way even though those calls are still passing
> no flags.)
>
> > - SVN_ERR(open_path(&parent_path, root, path, 0, NULL, pool));
> > + SVN_ERR(open_path(&parent_path, root, path,
> > + open_path_uncached | open_path_is_canonical
> > + | open_path_node_only, NULL, pool));
>
>
> (This is the only significant place where the 'is_canonical' flag is used.)
>
> > node = parent_path->node;
> >
> > /* No need to cache our find -- open_path() will do that for
> us. */
> > @@ -3162,7 +3217,7 @@ static svn_error_t *fs_closest_copy(svn_
> > if (kind == svn_node_none)
> > return SVN_NO_ERROR;
> > SVN_ERR(open_path(&copy_dst_parent_path, copy_dst_root, path,
> > - 0, NULL, pool));
> > + open_path_node_only, NULL, pool));
> > copy_dst_node = copy_dst_parent_path->node;
> > if (!
> svn_fs_fs__id_check_related(svn_fs_fs__dag_get_id(copy_dst_node),
> >
> > svn_fs_fs__dag_get_id(parent_path->node)))
> > @@ -3775,7 +3830,8 @@ get_mergeinfo_for_path_internal(svn_merg
> >
> > path = svn_fs__canonicalize_abspath(path, scratch_pool);
> >
> > - SVN_ERR(open_path(&parent_path, rev_root, path, 0, NULL,
> scratch_pool));
> > + SVN_ERR(open_path(&parent_path, rev_root, path,
> open_path_is_canonical,
> > + NULL, scratch_pool));
>
>
> Here you've left a plain 'canonicalize' in the caller, even though you
> appear to think that the version inside open_path() is more efficient.
> (The result of this 'canonicalize' here is *only* passed to open_path().)
>
>
> > if (inherit == svn_mergeinfo_nearest_ancestor && !
> > parent_path->parent)
> > return SVN_NO_ERROR;
>
>
> - Julian
>
>
Changes committed in r1448810 with a follow-up in r1448820.

Thanks for the review!

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
*
http://www.wandisco.com/subversion/download
*
Received on 2013-02-21 22:56:47 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.