[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Issue #2748: non-UTF-8 filenames in the repository

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Wed, 17 Sep 2008 18:28:30 +0300 (Jerusalem Daylight Time)

> > Index: subversion/include/svn_path.h
> > ===================================================================
> > --- subversion/include/svn_path.h (revision 33124)
> > +++ subversion/include/svn_path.h (working copy)
> > @@ -437,16 +437,27 @@ svn_path_is_ancestor(const char *path1, const char
> > * a repository. There may be other, OS-specific, limitations on
> > * what paths can be represented in a working copy.
> > *
> > - * ASSUMPTION: @a path is a valid UTF-8 string. This function does
> > - * not check UTF-8 validity.
> > + * When @a assume_utf8 is @c TRUE, assume that @a path is a valid
> > + * UTF-8 string and do not check UTF-8 validity.
> [...]
> > -svn_error_t *svn_path_check_valid(const char *path, apr_pool_t *pool);
> > +svn_error_t *
> > +svn_path_check_valid2(const char *path, svn_boolean_t assume_utf8,
> > + apr_pool_t *pool);
>
> This is ugly! Our main "validate a path" API should not have an option
> for allowing invalid paths.
>

When assume_utf8 is False, the new API is equivalent to the existing one
(prior to the revving), and when assume_utf8 is True, the new API
explicitly checks for valid UTF-8 (and errors if it is invalid).

So, if that option is here, it has been here since 1.2 at least (when
svn_path_check_valid was added) -- it's not something I introduced.

> When an API says it assumes and does not check something, I understand
> that to mean it's a requirement on the caller, not that it's permissible
> to violate the assumption. I understand "does not check" as being a note
> about the current implementation, to reinforce the message that it's the
> caller's responsibility.
>

Agreed. And I only added a flag that makes the function do that check for
the caller. It was just for convenience (read: laziness). Existing
callers that do the checking themselves (they follow the API, right?) can
pass False. New callers can pass True instead of coding the check
themselves.

We could make the function check UTF-8 always (removing the need for the
assumption), never (relying on the assumption), or with a flag (for lazy
callers). And either way we'll need to fix some callers.

Daniel

> May I suggest something like this:
>
> - Change the implementation of svn_path_check_valid() to validate for
> UTF-8. (I see this as enforcing a requirement, so a backward-compatible
> change even though some callers will be bitten by it.
> )
>
> - Any callers that were doing their own UTF-8 validation prior to
> calling this: remove that, as it will now be redundant.
>
> - Change the callers that we know might need to handle non-UTF8 paths
> (such as the ones that are reading from a possibly-invalid repository
> history) to handle an error returned by this newly strengthened
> validation. When they encounter this error, they should call a private
> semi-validation function that behaves like the old version of
> svn_path_check_valid().
>
> /* [Check the path...]
> * @note Prior to version 1.6 this function did not check but only
> * assumed that @a path was a valid UTF-8 string. */
> svn_error_t *
> svn_path_check_valid(const char *path, apr_pool_t *pool);
>
> /* Like svn_path_check_valid() except @a path need not be a valid
> * UTF-8 string. */
> svn_error_t *
> svn_path__check_valid_except_utf8(const char *path, apr_pool_t *pool);
>
> (Note that the language here is "need not be valid UTF-8" rather than
> "assume it's valid but do not check".)
>
> Making sense?
>
> - Julian
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-17 17:28:48 CEST

This is an archived mail posted to the Subversion Dev mailing list.