[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] improve speed of non-verbose svn list

From: <kfogel_at_collab.net>
Date: 2005-10-25 22:46:19 CEST

Garrett Rooney <rooneg@electricjellyfish.net> writes:
> [[[
> Speed up non-verbose 'svn list' by a scary amount by allowing the server
> to avoid calculating parts of the svn_dirent_t objects that are never used.
>
> * subversion/libsvn_ra/ra_loader.c
> (svn_ra_get_dir2): new wrapper function.
>
> * subversion/libsvn_ra/ra_loader.h
> (svn_ra__vtable_t): add get_dir2 function.
>
> * subversion/include/svn_types.h
> (SVN_DIRENT_KIND,
> SVN_DIRENT_SIZE,
> SVN_DIRENT_HAS_PROPS,
> SVN_DIRENT_CREATED_REV,
> SVN_DIRENT_TIME,
> SVN_DIRENT_LAST_AUTHOR): new bitfield values for each field in a dirent.
> (SVN_DIRENT_ALL): combination of all the other bitfield values.
>
> * subversion/include/svn_client.h
> (svn_client_ls4): new function, like svn_client_ls3, but with a dirent
> fields parameter.
> (svn_client_ls3): deprecated in favor of svn_client_ls4.
>
> * subversion/include/svn_ra.h
> (svn_ra_get_dir2): new version of svn_ra_get_dir, takes a dirent fields
> parameter.
> (svn_ra_get_dir): deprecate in favor of svn_ra_get_dir2.
>
> * subversion/libsvn_ra_local/ra_plugin.c
> (svn_ra_local__get_dir2): new function, currently ignores the dirent
> fields parameter.
> (svn_ra_local__get_dir): call into svn_ra_local__get_dir, passing it
> SVN_DIRENT_ALL for the dirent_fields parameter.
> (ra_local_vtable): add svn_ra_Local__get_dir2.
>
> * subversion/libsvn_client/ls.c
> (get_dir_contents): add a dirent_fields parameter, use svn_ra_get_dir2,
> update recursive call.
> (svn_client_ls4): new list function, adds a dirent_fields parameter which
> is passed to get_dir_contents.
> (svn_client_ls3): backwards compat wrapper, calls svn_client_ls4 with the
> dirent_fields set to SVN_DIRENT_ALL.
>
> * subversion/clients/cmdline/ls-cmd.c
> (svn_cl__ls): use svn_client_ls4, only asking for the dirent fields we
> actually need.
>
> * subversion/libsvn_ra_svn/client.c
> (ra_svn_get_dir2): new version of get_dir, pass new dirent_fields
> parameter along to the server.
> (ra_svn_get_dir): pass through to new function, passing SVN_DIRENT_ALL
> for dirent_fields.
> (ra_svn_vtable): add ra_svn_get_dir2.
>
> * subversion/libsvn_ra_dav/ra_dav.h
> (svn_ra_dav__get_dir2): new prototype.
>
> * subversion/libsvn_ra_dav/session.c
> (dav_vtable): add svn_ra_dav__get_dir2.
>
> * subversion/libsvn_ra_dav/fetch.c
> (svn_ra_dav__get_dir2): new version of get_dir, takes a dirent fields
> arg, dynamically figure out which props we need to ask for to fill in
> the dirent fields that were asked for. fall back to asking for all
> of them in the has_props case, since we currently have no other way
> to get that data. only fill in the parts of the dirent we were asked
> to get. create and destroy a new subpool, which is used to allocate
> the temporary neon properties array.

Wow, this part reads like the function's doc string. Sure you want
all that in the log message? :-)

(By the way, my personal nit: If the first letters of sentences are
capitalized, it's a lot easier for me to read. I suspect this is true
of others as well, but don't have any evidence to back that up.)

Okay, on to more substantive comments...

> (svn_ra_dav__get_dir): compat wrapper, passes SVN_DIRENT_ALL as the
> dirent_flags arg to svn_ra_dav__get_dir2.
>
> * subversion/svnserve/serve.c
> (get_dir): accept an optional dirent fields parameter from the client,
> defaulting to SVN_DIRENT_ALL if it's not present. only initialize the
> parts of the dirents that the client asked for.
> ]]]
>
> [...]
>
> --- subversion/include/svn_client.h (revision 17006)
> +++ subversion/include/svn_client.h (working copy)
> @@ -2146,7 +2146,31 @@
> * If @a recurse is true (and @a path_or_url is a directory) this will
> * be a recursive operation.
> *
> + * @a dirent_fields controls which fields in the @c svn_dirent_t's are
> + * filled in. To have them totally filled in use @c SVN_DIRENT_ALL,
> + * otherwise simply binary or together the combination of @c SVN_DIRENT_
> + * fields you care about.
> + *
> + * @since New in 1.4.
> + */

If you say "binary OR" it looks less like a typo. Actually, "bitwise OR"
might be even better.

> --- subversion/libsvn_ra_local/ra_plugin.c (revision 17006)
> +++ subversion/libsvn_ra_local/ra_plugin.c (working copy)
> @@ -947,13 +947,14 @@
>
> /* Getting a directory's entries */
> static svn_error_t *
> -svn_ra_local__get_dir (svn_ra_session_t *session,
> - const char *path,
> - svn_revnum_t revision,
> - apr_hash_t **dirents,
> - svn_revnum_t *fetched_rev,
> - apr_hash_t **props,
> - apr_pool_t *pool)
> +svn_ra_local__get_dir2 (svn_ra_session_t *session,
> + const char *path,
> + svn_revnum_t revision,
> + apr_uint32_t dirent_fields,
> + apr_hash_t **dirents,
> + svn_revnum_t *fetched_rev,
> + apr_hash_t **props,
> + apr_pool_t *pool)
> {
> svn_fs_root_t *root;
> svn_revnum_t youngest_rev;

A "###" comment here indicating that dirent_fields is currently
ignored might be a good idea.

I just realized that, formally, we don't need to use the deprecation
conventions for all these double-underscore functions. But since
they're mapping directly to public APIs, I guess following the same
convention internally is the clearest thing to do, so, yay.

> --- subversion/libsvn_ra_svn/client.c (revision 17006)
> +++ subversion/libsvn_ra_svn/client.c (working copy)
> @@ -1018,11 +1018,13 @@
> return SVN_NO_ERROR;
> }
>
> -static svn_error_t *ra_svn_get_dir(svn_ra_session_t *session, const char *path,
> - svn_revnum_t rev, apr_hash_t **dirents,
> - svn_revnum_t *fetched_rev,
> - apr_hash_t **props,
> - apr_pool_t *pool)
> +static svn_error_t *ra_svn_get_dir2(svn_ra_session_t *session,
> + const char *path, svn_revnum_t rev,
> + apr_uint32_t dirent_fields,
> + apr_hash_t **dirents,
> + svn_revnum_t *fetched_rev,
> + apr_hash_t **props,
> + apr_pool_t *pool)
> {
> ra_svn_session_baton_t *sess_baton = session->priv;
> svn_ra_svn_conn_t *conn = sess_baton->conn;
> @@ -1035,8 +1037,9 @@
> apr_uint64_t size;
> svn_dirent_t *dirent;
>
> - SVN_ERR(svn_ra_svn_write_cmd(conn, pool, "get-dir", "c(?r)bb", path,
> - rev, (props != NULL), (dirents != NULL)));
> + SVN_ERR(svn_ra_svn_write_cmd(conn, pool, "get-dir", "c(?r)bbn", path,
> + rev, (props != NULL), (dirents != NULL),
> + (apr_uint32_t) dirent_fields));
> SVN_ERR(handle_auth_request(sess_baton, pool));
> SVN_ERR(svn_ra_svn_read_cmd_response(conn, pool, "rll", &rev, &proplist,
> &dirlist));

You've extended the protocol to transmit dirent_fields as a number ("n").

This is fine as long as both sides use apr_uint32_t, I guess. But the
doc string for svn_ra_svn_write_cmd() says "n" means apr_uint64_t...
Since dirent_fields is apr_uint32_t everywhere already, did you just
mistype the cast?

But stepping back for a moment: might it not be better just to make
all 'dirent_fields' parameters apr_uint64_t, since that doubles the
amount of room for future expansion? As a minor side benefit, that
would allow us to remove the cast here entirely.

> --- subversion/libsvn_ra_dav/fetch.c (revision 17006)
> +++ subversion/libsvn_ra_dav/fetch.c (working copy)
> @@ -936,6 +939,7 @@
> apr_size_t final_url_n_components;
> svn_ra_dav__session_t *ras = session->priv;
> const char *url = svn_path_url_add_component (ras->url->data, path, pool);
> + apr_pool_t *subpool = svn_pool_create (pool);
>
> /* If the revision is invalid (head), then we're done. Just fetch
> the public URL, because that will always get HEAD. */
> @@ -963,11 +967,82 @@
>
> if (dirents)
> {
> + ne_propname *which_props;
> +
> + /* if we didn't ask for the has_props field, we can get individual
> + properties. */
> + if ((SVN_DIRENT_HAS_PROPS & dirent_fields) == 0)
> + {
> + apr_size_t num_props = 1; /* start with one for the final NULL */
> +
> + if (dirent_fields & SVN_DIRENT_KIND)
> + ++num_props;
> +
> + if (dirent_fields & SVN_DIRENT_SIZE)
> + ++num_props;
> +
> + if (dirent_fields & SVN_DIRENT_CREATED_REV)
> + ++num_props;
> +
> + if (dirent_fields & SVN_DIRENT_TIME)
> + ++num_props;
> +
> + if (dirent_fields & SVN_DIRENT_LAST_AUTHOR)
> + ++num_props;
> +
> + which_props = apr_pcalloc (subpool,
> + num_props * sizeof (ne_propname));
> +
> + /* first, null out the end... */
> + which_props[num_props].nspace = NULL;
> + which_props[num_props--].name = NULL;
> +
> + /* Now, go through and fill in the ones we care about, moving along
> + the array as we go. */
> +
> + if (dirent_fields & SVN_DIRENT_KIND)
> + {
> + which_props[num_props].nspace = "DAV:";
> + which_props[num_props--].name = "resourcetype";
> + }
> +
> + if (dirent_fields & SVN_DIRENT_SIZE)
> + {
> + which_props[num_props].nspace = "DAV:";
> + which_props[num_props--].name = "getcontentlength";
> + }
> +
> + if (dirent_fields & SVN_DIRENT_CREATED_REV)
> + {
> + which_props[num_props].nspace = "DAV:";
> + which_props[num_props--].name = "version-name";
> + }
> +
> + if (dirent_fields & SVN_DIRENT_TIME)
> + {
> + which_props[num_props].nspace = "DAV:";
> + which_props[num_props--].name = "creationdate";
> + }
> +
> + if (dirent_fields & SVN_DIRENT_LAST_AUTHOR)
> + {
> + which_props[num_props].nspace = "DAV:";
> + which_props[num_props--].name = "creator-displayname";
> + }
> +
> + assert (num_props == 0);
> + }

The assertion is a nice touch :-).

> + else
> + {
> + /* get all props, since we need them all to do has_props */
> + which_props = NULL;
> + }
> +

Why the subpool? There's no loop here, does it really save us anything?
True, we allocate an array whose length is proportional to num_props,
but num_props has a low maximum -- it's not like its magnitude is
controlled by the client or anything.

> /* Just like Nautilus, Cadaver, or any other browser, we do a
> PROPFIND on the directory of depth 1. */
> SVN_ERR( svn_ra_dav__get_props(&resources, ras->sess,
> final_url, NE_DEPTH_ONE,
> - NULL, NULL /* all props */, pool) );
> + NULL, which_props, pool) );
>
> /* Count the number of path components in final_url. */
> final_url_n_components = svn_path_component_count(final_url);
> @@ -1000,57 +1075,76 @@
> continue;
>
> entry = apr_pcalloc (pool, sizeof(*entry));
> -
> - /* node kind */
> - entry->kind = resource->is_collection ? svn_node_dir : svn_node_file;
> -
> - /* size */
> - propval = apr_hash_get(resource->propset,
> - SVN_RA_DAV__PROP_GETCONTENTLENGTH,
> - APR_HASH_KEY_STRING);
> - if (propval == NULL)
> - entry->size = 0;
> - else
> - entry->size = svn__atoui64(propval->data);
> -
> - /* does this resource contain any 'svn' or 'custom' properties,
> - i.e. ones actually created and set by the user? */
> - for (h = apr_hash_first (pool, resource->propset);
> - h; h = apr_hash_next (h))
> +
> + if (dirent_fields & SVN_DIRENT_KIND)
> {
> - const void *kkey;
> - void *vval;
> - apr_hash_this (h, &kkey, NULL, &vval);
> + /* node kind */
> + entry->kind = resource->is_collection ? svn_node_dir
> + : svn_node_file;
> + }
> +
> + if (dirent_fields & SVN_DIRENT_SIZE)
> + {
> + /* size */
> + propval = apr_hash_get(resource->propset,
> + SVN_RA_DAV__PROP_GETCONTENTLENGTH,
> + APR_HASH_KEY_STRING);
> + if (propval == NULL)
> + entry->size = 0;
> + else
> + entry->size = svn__atoui64(propval->data);
> + }
> +
> + if (dirent_fields & SVN_DIRENT_HAS_PROPS)
> + {
> + /* does this resource contain any 'svn' or 'custom' properties,
> + i.e. ones actually created and set by the user? */
> + for (h = apr_hash_first (pool, resource->propset);
> + h; h = apr_hash_next (h))
> + {
> + const void *kkey;
> + void *vval;
> + apr_hash_this (h, &kkey, NULL, &vval);
>
> - if (strncmp((const char *)kkey, SVN_DAV_PROP_NS_CUSTOM,
> - sizeof(SVN_DAV_PROP_NS_CUSTOM) - 1) == 0)
> - entry->has_props = TRUE;
> + if (strncmp((const char *)kkey, SVN_DAV_PROP_NS_CUSTOM,
> + sizeof(SVN_DAV_PROP_NS_CUSTOM) - 1) == 0)
> + entry->has_props = TRUE;
>
> - else if (strncmp((const char *)kkey, SVN_DAV_PROP_NS_SVN,
> - sizeof(SVN_DAV_PROP_NS_SVN) - 1) == 0)
> - entry->has_props = TRUE;
> + else if (strncmp((const char *)kkey, SVN_DAV_PROP_NS_SVN,
> + sizeof(SVN_DAV_PROP_NS_SVN) - 1) == 0)
> + entry->has_props = TRUE;
> + }
> + }

Odd, why not just one conditional with an "||" ? <shrug> The shape
of the code wasn't introduced by you, of course.

> --- subversion/svnserve/serve.c (revision 17006)
> +++ subversion/svnserve/serve.c (working copy)
> @@ -1062,9 +1062,10 @@
> svn_fs_root_t *root;
> apr_pool_t *subpool;
> svn_boolean_t want_props, want_contents;
> + apr_uint64_t dirent_fields = SVN_DIRENT_ALL;
>
> - SVN_ERR(svn_ra_svn_parse_tuple(params, pool, "c(?r)bb", &path, &rev,
> - &want_props, &want_contents));
> + SVN_ERR(svn_ra_svn_parse_tuple(params, pool, "c(?r)bb?n", &path, &rev,
> + &want_props, &want_contents, &dirent_fields));

Good, here dirent_fields is an apr_uint64_t, which I guess it needs to
be no matter how the parameter is casted back on the client side.

Nice patch!

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Oct 26 00:03:48 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.