[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

character set handling (was: Re: [proposal] --targets command line option)

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-02-28 20:55:50 CET

On Thu, Feb 28, 2002 at 10:12:44AM -0800, Zack Weinberg wrote:
>...
> At least on Unix, there is nothing which prevents a file name from
> containing newlines.

[ per other responses on this issue, SVN's answer is "TFB" ]

>...
> Another option is for Subversion to reject any operation which would
> create a path name containing dangerous characters. The trouble with
> that is, the set of dangerous characters depends on the locale and
> character set in use. For instance, under normal conditions ESC is a
> dangerous character, but users of various CJK character sets may well
> expect to be able to stick ESC in file names (without even being aware
> that that's what they're doing).

The design intent is for SVN core to use UTF-8 pathnames throughout.
Essentially, when you hit the APIs, you have paths that use the UTF-8
character set, and use forward slashes.

At the moment, we do a bit (any?) of conversion of Windows' backslashes, but
nobody has character set conversion. (and we certainly don't do checks) The
conversion process is going to be kind of an interesting problem: what is
the incoming character set? Once we know that, then we need conversion
functions (apr-iconv maybe)

Note that once we have UTF-8 internally, a bunch of stuff becomes easier for
us: that is the "native" charset for XML. That is, using UTF-8 will "just
work".

Also note that (today) we don't do the conversion on input, which gives us
problems with i18n users. When they use a character with the 8th bit turned
on, then things bust cuz it is fed into a UTF-8 character parser (which
promptly declares it an illegal UTF-8 char).

Back to your original point: I don't know that we need to check for
"dangerous" pathnames. Personally, I'd be happy to eliminate all characters
less than u0020 (space).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.