[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] issue 1954 - try 2

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-12-10 02:31:46 CET

VK Sameer <sameer@collab.net> writes:

> --- subversion/include/svn_path.h (revision 12257)
> +++ subversion/include/svn_path.h (working copy)
> @@ -356,6 +356,39 @@
> const char *path2,
> apr_pool_t *pool);
>
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * A valid Subversion pathname is a UTF-8 string without control
> + * characters, except for SPACE (0x20). "Valid" means Subversion
> + * can store the pathname in a repository. There may be other
> + * OS-specific limitations on what paths can be represented in
> + * a working copy.
> + */
> +svn_boolean_t svn_path_is_valid (const char *path,
> + apr_pool_t *pool,
> + svn_boolean_t check_utf8);

Most of our interfaces have pool as the last parameter.

> +
> +/**
> + * @since New in 1.2.
> + *
> + * Check whether @a path is a valid Subversion path and generate
> + * error if invalid. See docstring of svn_path_is_valid() for
> + * definition of valid pathname
> + * @a pool is used for temporary allocation
> + * @a check_utf8 turns on additional checks
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns SVN_ERR_FS_PATH_SYNTAX if invalid
> + */
> +svn_error_t *svn_path_error_if_invalid (const char *path,
> + apr_pool_t *pool,
> + svn_boolean_t check_utf8);

The function name svn_path_error_if_invalid is ugly.

> +
>
> /** URI/URL stuff
> *
> Index: subversion/include/svn_utf.h
> ===================================================================
> --- subversion/include/svn_utf.h (revision 12257)
> +++ subversion/include/svn_utf.h (working copy)
> @@ -171,6 +171,16 @@
> const svn_string_t *src,
> apr_pool_t *pool);
>
> +/** @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks

The public don't care about check_cstring_utf8, so if that function
has any important documentation it should be here.

> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL if invalid

I don't think Subversion should generate APR_XXX errors, although
that's probably the existing code rather than your stuff.

> + */
> +svn_error_t *svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool);

That makes three new public API functions just to check paths!

> +
> #ifdef __cplusplus
> }
> #endif /* __cplusplus */
> Index: subversion/libsvn_subr/utf.c
> ===================================================================
> --- subversion/libsvn_subr/utf.c (revision 12257)
> +++ subversion/libsvn_subr/utf.c (working copy)
> @@ -476,6 +476,19 @@
> return SVN_NO_ERROR;
> }
>
> +/* @since New in 1.2
> + *
> + * Public interface around check_cstring_utf8, which checks
> + * whether the NULL-terminated sequence @a DATA is valid UTF-8
> + *
> + * returns SVN_NO_ERROR if valid
> + * returns APR_EINVAL if invalid
> + */

Better not to duplicate the docstring.

> +svn_error_t *
> +svn_utf_check_cstring_utf8 (const char *data, apr_pool_t *pool)
> +{
> + return (check_cstring_utf8 (data, pool));

Is there any point keeping check_cstring_utf8?

> +}
>
> svn_error_t *
> svn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest,
> Index: subversion/libsvn_subr/path.c
> ===================================================================
> --- subversion/libsvn_subr/path.c (revision 12257)
> +++ subversion/libsvn_subr/path.c (working copy)
> @@ -29,6 +29,7 @@
> #include "svn_private_config.h" /* for SVN_PATH_LOCAL_SEPARATOR */
> #include "svn_utf.h"
> #include "svn_io.h" /* for svn_io_stat() */
> +#include "svn_ctype.h"
>
>
> /* The canonical empty path. Can this be changed? Well, change the empty
> @@ -1248,3 +1249,69 @@
> else
> return svn_utf_cstring_to_utf8 (path_utf8, path_apr, pool);
> }
> +
> +/* catchall function that performs various checks on path
> + * and fills out params based on switches.
> + * If check_utf8 is TRUE, validates path for utf8 correctness
> + * Otherwise, only checks for existence of control chars.
> + * If generate_error is TRUE and path is invalid, an error message
> + * is returned in err
> + * (created to avoid repeating code in two functions,
> + * one which returns an error message and one which doesn't)
> + */
> +static svn_boolean_t
> +validate_svn_path (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8,
> + svn_boolean_t generate_error,
> + svn_error_t *err)

Huh? You are passing an error in?

> +{
> + const char *c = path;

I don't like initialisation spread out like this for no reason, for()
statements have a perfectly good initialisation clause.

> +
> + if (check_utf8)
> + {
> + svn_error_t *utf_err = svn_utf_check_cstring_utf8 (path, pool);
> + if (utf_err)
> + {
> + if (generate_error)
> + err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, utf_err,
> + "Invalid UTF-8 in path: '%s'",
> + svn_path_local_style (path, pool));

You can't print that path it's invalid UTF-8. It's academic since the
error never gets returned.

> + else
> + svn_error_clear (utf_err);
> +
> + return FALSE;
> + }
> + }
> +
> + for (; *c; c++)
> + if (svn_ctype_iscntrl(*c))
> + {
> + if (generate_error)
> + err = svn_error_createf (SVN_ERR_FS_PATH_SYNTAX, NULL,
> + "Invalid character '%x' in path '%s'",
> + *c,

char could be signed, control characters are probably less than
SCHAR_MAX but why take the risk? Cast to unsigned instead.

> + svn_path_local_style (path, pool));
> + return FALSE;
> + }
> +
> + return TRUE;
> +}
> +
> +svn_boolean_t
> +svn_path_is_valid (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8)
> +{
> + svn_error_t *err = SVN_NO_ERROR;

Pointless initialisation.

> +
> + return validate_svn_path (path, pool, check_utf8, FALSE, err);
> +}
> +
> +svn_error_t*
> +svn_path_error_if_invalid (const char *path, apr_pool_t *pool,
> + svn_boolean_t check_utf8)
> +{
> + svn_error_t* err = SVN_NO_ERROR;
> +
> + validate_svn_path (path, pool, check_utf8, TRUE, err);
> + return err;

This always returns SVN_NO_ERROR.

> +}

The whole interface looks slightly odd, but I can't really put my
finger on the problem. Do we really need two functions? Do we really
need the check_utf8 flag?

The reason to have a function return a boolean is because we expect
the "error" to occur, we intend to handle it, and we don't want the
overhead of creating and destroying an error. I don't think that
applies to this stuff, the error is likely to be irrecoverable. So I
guess we don't need the boolean function.

I'm guessing we don't need the check_utf8 flag. You have already
added a new public interface to the existing check-UTF-8 function, so
all we really need is a new check-control-character-function, one that
returns an error. Where it is know that the path is UTF-8 just call
the second, if the path is not known to be UTF-8 call both.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Dec 10 03:00:22 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.