[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] First cut at 1954 solution

From: <kfogel_at_collab.net>
Date: 2004-12-07 03:10:46 CET

VK Sameer <sameer@collab.net> writes:
> First cut at issue 1954 - We should error out on pathnames added or
> imported that we don't accept.
>
> Added checks based on svn_ctype_isutf8(), svn_ctype_iscntrl().
>
> * subversion/include/svn_path.h
> subversion/libsvn_subr/path.c
> (svn_path_is_valid_in_svn): New function
>
> * General comment - added calls to svn_path_is_valid_in_svn()
> in functions listed below
>
> * subversion/libsvn_wc/adm_ops.c
> modified svn_wc_add()
>
> * subversion/libsvn_ra_local/ra_plugin.c
> modified svn_ra_local__do_check_path()
>
> * subversion/libsvn_client/copy.c
> modified wc_to_wc_copy()
>
> * subversion/libsvn_client/add.c
> modified add_dir_recursive(), add_file()
>
> * subversion/libsvn_client/commit.c
> modified import_file(), import_dir()
>
> * subversion/tests/clients/cmdline/commit_tests.py
> pulled tab test code from commit_uri_unsafe() into tab_test()

Run 'svn log' on the Subversion trunk to see examples of our usual log
formatting style. See also the section "Writing log messages" in the
HACKING file.

Here is how I would write it:

   Resolve issue #1954: Error on add or import of a path that is
   invalid in Subversion.

   ####################################################
   ### ###
   ### Note: This is a draft patch for review only, ###
   ### please do not commit it. ###
   ### ###
   ####################################################

   * subversion/include/svn_path.h,
     subversion/libsvn_subr/path.c
     (svn_path_is_valid_in_svn): New function.
  
   * subversion/libsvn_wc/adm_ops.c
     (svn_wc_add): Call svn_path_is_valid_in_svn().
  
   * subversion/libsvn_ra_local/ra_plugin.c
     (svn_ra_local__do_check_path): Same.
  
   * subversion/libsvn_client/copy.c
     (wc_to_wc_copy): Same.
  
   * subversion/libsvn_client/add.c
     (add_dir_recursive, add_file): Same.
  
   * subversion/libsvn_client/commit.c
     (import_file, import_dir): Same.
  
   * subversion/tests/clients/cmdline/commit_tests.py
     (tab_test): New test.
     (test_list): Run it XFail.

Okay, on to the patch itself...

> Index: subversion/include/svn_path.h
> ===================================================================
> --- subversion/include/svn_path.h (revision 12179)
> +++ subversion/include/svn_path.h (working copy)
> @@ -338,6 +338,22 @@
> svn_boolean_t svn_path_is_backpath_present (const char *path);
>
>
> +/**
> + * @since New in 1.2.
> + *
> + * Test to see if a path is allowed in a Subversion repository.
> + * First check whether all characters are valid in UTF8
> + * Next check whether all characters are non-control characters
> + * Currently, all other paths are allowed.
> + *
> + * This is OS-independent - it is used to decide what paths may be
> + * added to a repository. Adding occurs either using a file on disk
> + * (svn import/add) or one not on disk (svn mv/copy URL1 URL2).
> + */
> +svn_error_t *svn_path_is_valid_in_svn (const char *path,
> + apr_size_t len,
> + apr_pool_t *pool);
> +

(My comments will be similar to what others have said already in
response to this patch.)

Your doc string describes the implementation. Instead, it should
describe the *semantics* -- the behavior of the function. For
example, what happens if the path is not valid? Is an error returned,
and if so, which error code?

Here's a new interface, with a new doc string:

   /**
    * @since New in 1.2.
    *
    * Set @a *is_valid to TRUE if @a path is a valid Subversion path,
    * else set it to FALSE. Use @a pool for temporary allocation.
    *
    * Valid Subversion paths are UTF-8 strings with no control characters
    * except for SPACE (0x20). "Valid" means Subversion can store
    * it in a repository. There may be other, OS-specific limitations
    * on what paths can be represented in a working copy.
    */
    svn_error_t *svn_path_is_valid (svn_boolean_t *is_valid,
                                    const char *path,
                                    apr_pool_t *pool);

(Note also that I shortened the function name.)

I'm not sure whether you need to return 'svn_error_t *' at all or not
-- it partly depends on what you'll be calling. If it turns out that
an error can happen, then add a paragraph describing the
circumstances, and (ideally) the error codes.

The reason I changed the prototype is that the path being invalid is
*not* an error :-). After all, determining whether or not a path is
valid is exactly what function is meant do. So if it successfully
determines that, that shouldn't be an error condition.

More practically, every svn_error_t we manufacture is allocated in a
global pool. If someone were to call your function many times on
invalid paths, then we'd have needless memory growth.

So both from an interface consistency standpoint and a performance
standpoint, returning the answer by reference as a boolean is better.
(But see below about offering a wrapper function for convenience.)

> /** Test if @a path2 is a child of @a path1.
> * If not, return @c NULL.
> * If so, return a copy of the remainder path, allocated in @a pool.
> Index: subversion/libsvn_wc/adm_ops.c
> ===================================================================
> --- subversion/libsvn_wc/adm_ops.c (revision 12179)
> +++ subversion/libsvn_wc/adm_ops.c (working copy)
> @@ -879,6 +879,10 @@
> svn_node_kind_t kind;
> apr_uint32_t modify_flags = 0;
> svn_wc_adm_access_t *adm_access;
> +
> + SVN_ERR (svn_path_is_valid_in_svn (path,
> + strlen(path),
> + pool));

Naturally, all of these calls would have to change to use some
'&boolean_var'.

But since in most call sites you just want to error anyway, it would
make sense to have a shared wrapper:

   /**
    * @since New in 1.2.
    *
    * Return SVN_ERR_FS_PATH_SYNTAX if @a path is not a valid
    * Subversion path, else return SVN_NO_ERROR. See
    * @c svn_path_is_valid() for more about valid paths.
    */
    svn_error_t *svn_path_error_if_invalid (const char *path,
                                            apr_pool_t *pool);

The reason to have both the return-by-reference interface and the
error-returning wrapper is so that there's a way to ask the question
without generating an error (which is useful in some circumstances)
and a way to ask it and get an error automatically (which is useful in
other circumstances).

Naturally, most of the calls in this patch are the second kind of
circumstance. But when offering a predicate API, I think it's good to
offer a non-erroring way to answer the question too.

> /* Make sure something's there. */
> SVN_ERR (svn_io_check_path (path, &kind, pool));
> Index: subversion/libsvn_subr/path.c
> ===================================================================
> --- subversion/libsvn_subr/path.c (revision 12179)
> +++ subversion/libsvn_subr/path.c (working copy)
> @@ -29,6 +29,7 @@
> #include "svn_private_config.h" /* for SVN_PATH_LOCAL_SEPARATOR */
> #include "svn_utf.h"
> #include "svn_io.h" /* for svn_io_stat() */
> +#include "svn_ctype.h" /* for svn_ctype_() */

You don't need to say "for svn_ctype_()", it's obvious from the header
name.

> /* The canonical empty path. Can this be changed? Well, change the empty
> @@ -1248,3 +1249,25 @@
> else
> return svn_utf_cstring_to_utf8 (path_utf8, path_apr, pool);
> }
> +
> +svn_error_t*
> +svn_path_is_valid_in_svn (const char *path,
> + apr_size_t len,
> + apr_pool_t *pool)
> +{
> + int i;
> + for (i = 0; i < len; i++)
> + {
> + if (!svn_ctype_isutf8(path[i]))
> + return svn_error_createf(SVN_ERR_FS_PATH_SYNTAX, NULL,
> + "Invalid UTF8 character in '%s'",
> + svn_path_local_style (path, pool));
> +
> + if (svn_ctype_iscntrl(path[i]))
> + return svn_error_createf(SVN_ERR_FS_PATH_SYNTAX, NULL,
> + "Invalid control character in '%s'",
> + svn_path_local_style (path, pool));
> + }
> +
> + return SVN_NO_ERROR;
> +}

You've already seen Philip's comments about testing byte-by-byte.
UTF-8 is a multibyte encoding. I'm not sure how familiar you are with
it, but http://svn.red-bean.com/repos/main/3bits/utf8_xml.txt
summarizes how it works. I'm posting it here because I've been hoping
Brane or someone would proofread it anyway :-).

Anyway, you should use the existing svn_utf_* functions, and even
write new ones if you need them. Then svn_path_is_valid() can just
marshal the appropriate svn_utf_* and svn_ctype_* calls.

(The signatures of the svn_utf_* and svn_ctype_* will determine
whether you still need to return 'svn_error_t *' or not.)

> Index: subversion/libsvn_ra_local/ra_plugin.c
> ===================================================================
> --- subversion/libsvn_ra_local/ra_plugin.c (revision 12179)
> +++ subversion/libsvn_ra_local/ra_plugin.c (working copy)
> @@ -687,7 +687,15 @@
> svn_ra_local__session_baton_t *sbaton = session_baton;
> svn_fs_root_t *root;
> const char *abs_path = sbaton->fs_path;
> + svn_error_t *err;
>
> + if (path && (err = svn_path_is_valid_in_svn (path,
> + strlen(path),
> + pool)))
> + return svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, err,
> + _("Invalid pathname '%s'"),
> + svn_path_local_style (path, pool));

This may not matter after the rewrite, but just for the future:

Why wrap the error in another error that contains basically the same
information? Just "return err;" :-).

> --- subversion/libsvn_client/copy.c (revision 12179)
> +++ subversion/libsvn_client/copy.c (working copy)
> @@ -74,6 +74,10 @@
> svn_wc_adm_access_t *adm_access, *src_access;
> svn_error_t *err;
>
> + /* Verify path has no invalid characters */
> + SVN_ERR (svn_path_is_valid_in_svn (src_path, strlen(src_path),
> + pool));
> +
> /* Verify that SRC_PATH exists. */
> SVN_ERR (svn_io_check_path (src_path, &src_kind, pool));
> if (src_kind == svn_node_none)
> Index: subversion/libsvn_client/add.c
> ===================================================================
> --- subversion/libsvn_client/add.c (revision 12179)
> +++ subversion/libsvn_client/add.c (working copy)
> @@ -209,6 +209,11 @@
> svn_node_kind_t kind;
> svn_boolean_t is_special;
>
> + /* Check for invalid pathname characters */
> + SVN_ERR (svn_path_is_valid_in_svn (path,
> + strlen(path),
> + pool));
> +
> /* add the file */
> SVN_ERR (svn_wc_add (path, adm_access, NULL, SVN_INVALID_REVNUM,
> ctx->cancel_func, ctx->cancel_baton,
> @@ -276,6 +281,11 @@
> svn_wc_adm_access_t *dir_access;
> apr_array_header_t *ignores;
>
> + /* Check for invalid pathname characters */
> + SVN_ERR (svn_path_is_valid_in_svn (dirname,
> + strlen(dirname),
> + pool));
> +
> /* Check cancellation; note that this catches recursive calls too. */
> if (ctx->cancel_func)
> SVN_ERR (ctx->cancel_func (ctx->cancel_baton));
> Index: subversion/libsvn_client/commit.c
> ===================================================================
> --- subversion/libsvn_client/commit.c (revision 12179)
> +++ subversion/libsvn_client/commit.c (working copy)
> @@ -176,6 +176,9 @@
> svn_node_kind_t kind;
> svn_boolean_t is_special;
>
> + /* validate pathname characters */
> + SVN_ERR (svn_path_is_valid_in_svn (path, strlen (path), pool));
> +
> SVN_ERR (svn_io_check_special_path (path, &kind, &is_special, pool));
>
> if (kind == svn_node_unknown)
> @@ -276,6 +279,9 @@
> apr_hash_index_t *hi;
> apr_array_header_t *ignores;
>
> + /* validate pathname characters */
> + SVN_ERR (svn_path_is_valid_in_svn (path, strlen (path), pool));
> +

I don't think you need these comments explaining each call anyway (the
name of the function does that quite well). But if you are going to
put a comment, then write it in terms of semantics, not
implementation, i.e.,

   /* Make sure this path is valid. */

instead of

   /* valid pathname characters */

The latter describes exactly how the function works. People don't
need to know those details, they just need to know what it means in
the context of the calling code :-).

> SVN_ERR (svn_wc_get_default_ignores (&ignores, ctx->config, pool));
>
> SVN_ERR (svn_io_get_dirents (&dirents, path, pool));
> @@ -427,6 +433,7 @@
> apr_array_header_t *batons = NULL;
> const char *edit_path = "";
>
> + SVN_ERR (svn_io_check_path (path, &kind, pool));
> /* Get a root dir baton. We pass an invalid revnum to open_root
> to mean "base this on the youngest revision". Should we have an
> SVN_YOUNGEST_REVNUM defined for these purposes? */

You appear to have added this call to svn_io_check_path(). Is it part
of some other change? It's not mentioned in the log message.

I haven't thought carefully about how to write the regression tests
yet, so no comment on that part of the change for now.

I also haven't thought carefully about the call sites. When the next
revision of the patch comes in, I'll look at those in more detail.

Thanks! And have patience... several rounds of review are not
unusual :-).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Dec 7 03:16:34 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.