[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Issue #2748: non-UTF-8 filenames in the repository

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 17 Sep 2008 15:17:01 +0100

On Wed, 2008-09-17 at 15:42 +0300, Daniel Shahaf wrote:
> Patch for issue #2748 ("clients can create non UTF-8 filenames in the
> repository"). It prevents non-UTF-8 paths from entering the repository,
> but it doesn't affect existing files in existing repositories, svnsync,
> or 'svnadmin load'.

That sounds great as a functional change.

I have a question about one of the API changes...

> I'm running 'make check' now, and I'll commit later today or tomorrow if
> I don't hear anything.
>
> Daniel
>
> [[[
> Validate in libsvn_repos that paths added to the repository are valid UTF-8.
> This fixes issue #2748.
>
> TODO: update callers of svn_path_check_valid
>
> * subversion/libsvn_repos/commit.c
> (add_directory, add_file):
> Validate incoming paths for UTF-8 using svn_path_check_valid2().
>
> * subversion/include/svn_path.h
> (svn_path_check_valid2): New.
> (svn_path_check_valid): Deprecate.
>
> * subversion/libsvn_subr/path.c
> (utf_impl.h): Include.
> (svn_path_check_valid2): New.
> (svn_path_check_valid): Deprecate.
>
> * subversion/tests/libsvn_repos/repos-test.c
> (prop_validation_commit_with_revprop): Renamed to...
> (commit_path_with_revprops): ... this.
> Also, add KIND parameter, and make PROP_KEY optional.
> (prop_validation):
> Track rename and extension of prop_validation_commit_with_revprop().
> (paths_are_UTF_8): New test.
> (test_funcs): Run paths_are_UTF_8 as PASS.
> ]]]

[...]
> Index: subversion/include/svn_path.h
> ===================================================================
> --- subversion/include/svn_path.h (revision 33124)
> +++ subversion/include/svn_path.h (working copy)
> @@ -437,16 +437,27 @@ svn_path_is_ancestor(const char *path1, const char
> * a repository. There may be other, OS-specific, limitations on
> * what paths can be represented in a working copy.
> *
> - * ASSUMPTION: @a path is a valid UTF-8 string. This function does
> - * not check UTF-8 validity.
> + * When @a assume_utf8 is @c TRUE, assume that @a path is a valid
> + * UTF-8 string and do not check UTF-8 validity.
[...]
> -svn_error_t *svn_path_check_valid(const char *path, apr_pool_t *pool);
> +svn_error_t *
> +svn_path_check_valid2(const char *path, svn_boolean_t assume_utf8,
> + apr_pool_t *pool);

This is ugly! Our main "validate a path" API should not have an option
for allowing invalid paths.

When an API says it assumes and does not check something, I understand
that to mean it's a requirement on the caller, not that it's permissible
to violate the assumption. I understand "does not check" as being a note
about the current implementation, to reinforce the message that it's the
caller's responsibility.

May I suggest something like this:

  - Change the implementation of svn_path_check_valid() to validate for
UTF-8. (I see this as enforcing a requirement, so a backward-compatible
change even though some callers will be bitten by it.)

  - Any callers that were doing their own UTF-8 validation prior to
calling this: remove that, as it will now be redundant.

  - Change the callers that we know might need to handle non-UTF8 paths
(such as the ones that are reading from a possibly-invalid repository
history) to handle an error returned by this newly strengthened
validation. When they encounter this error, they should call a private
semi-validation function that behaves like the old version of
svn_path_check_valid().

/* [Check the path...]
 * @note Prior to version 1.6 this function did not check but only
 * assumed that @a path was a valid UTF-8 string. */
svn_error_t *
svn_path_check_valid(const char *path, apr_pool_t *pool);

/* Like svn_path_check_valid() except @a path need not be a valid
 * UTF-8 string. */
svn_error_t *
svn_path__check_valid_except_utf8(const char *path, apr_pool_t *pool);

(Note that the language here is "need not be valid UTF-8" rather than
"assume it's valid but do not check".)

Making sense?

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-17 16:17:20 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.