[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-05-23 17:55:14 CEST

(Thanks to Daniel Sternberg for the alternate mailing list archive
pointer. I'll bookmark it.)

On Thu, 2002-05-23 at 10:03, Marcus Comstedt wrote:
> Incentive is one thing. But Subversion is not in a position to
> _demand_ such a thing.

By itself, no. But I think it's reasonable for application developers
not to incur the hair of charset conversions when there is a superior
approach available to the world.

> If you want to use UTF-8 under Unix, all you have to do is select an
> UTF-8 locale and the conversions will be identity conversions.

I note that wasn't the default you chose.

> It's should be up to each individual user to choose what kind of
> "advantage of Unicode" he would prefer.

The "advantage" I was talking about was not having to do character set
conversion in applications, not any particular advantage to the user.
(Although the user also benefits indirectly from having a single
character set for all languages.)

> Although it's not 100% accurate to say thay UTF-8 validity of strings
> need not be enforced, since strings are being put in XML files without
> charset declarations, and such XML files must conform to UTF-8
> validity rules.

It's true, anything which shows up in XML and isn't encoded will be
checked for UTF-8 validity by expat. As far as I know, the user-visible
objects which meet this criterion are filenames and property names.
Property values and file data only show up in XML after being
base64-encoded, as far as I am aware.

I'm guessing property names aren't a big issue. If the issue is mainly
filenames, then it might be okay to handle conversion, if you do so by
wrapping an svn_file_open() around apr_file_open(). Don't do it by
adding a conversion step before every file open call.

> Conversion needed:
> Messages printed to stdout/stderr or non-XML logfiles need
> conversion

Does this apply to the libraries? A library function which prints to
stdout/stderr is useless for GUI programs, so we generally try to avoid
that. And I don't think we do much in the way of logging.

Anyway, messages presumably need to be localized, not just
charset-converted. (If the message contains a filename, the filename
might need to be charset-converted.) It doesn't add much value to
charset-convert them, other than perhaps filenames, without doing
localization as well.

> Name service calls such as getpwnam need conversion
> Command line arguments passed to exec need conversion

I don't see a reason why we should be mucking around in any way with
usernames and command-line arguments provided by the operating system or

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 23 17:56:21 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.