(Thanks to Daniel Sternberg for the alternate mailing list archive
pointer. I'll bookmark it.)
On Thu, 2002-05-23 at 10:03, Marcus Comstedt wrote:
> Incentive is one thing. But Subversion is not in a position to
> _demand_ such a thing.
By itself, no. But I think it's reasonable for application developers
not to incur the hair of charset conversions when there is a superior
approach available to the world.
> If you want to use UTF-8 under Unix, all you have to do is select an
> UTF-8 locale and the conversions will be identity conversions.
I note that wasn't the default you chose.
> It's should be up to each individual user to choose what kind of
> "advantage of Unicode" he would prefer.
The "advantage" I was talking about was not having to do character set
conversion in applications, not any particular advantage to the user.
(Although the user also benefits indirectly from having a single
character set for all languages.)
> Although it's not 100% accurate to say thay UTF-8 validity of strings
> need not be enforced, since strings are being put in XML files without
> charset declarations, and such XML files must conform to UTF-8
> validity rules.
It's true, anything which shows up in XML and isn't encoded will be
checked for UTF-8 validity by expat. As far as I know, the user-visible
objects which meet this criterion are filenames and property names.
Property values and file data only show up in XML after being
base64-encoded, as far as I am aware.
I'm guessing property names aren't a big issue. If the issue is mainly
filenames, then it might be okay to handle conversion, if you do so by
wrapping an svn_file_open() around apr_file_open(). Don't do it by
adding a conversion step before every file open call.
> Conversion needed:
>
> · Messages printed to stdout/stderr or non-XML logfiles need
> conversion
Does this apply to the libraries? A library function which prints to
stdout/stderr is useless for GUI programs, so we generally try to avoid
that. And I don't think we do much in the way of logging.
Anyway, messages presumably need to be localized, not just
charset-converted. (If the message contains a filename, the filename
might need to be charset-converted.) It doesn't add much value to
charset-convert them, other than perhaps filenames, without doing
localization as well.
> · Name service calls such as getpwnam need conversion
> · Command line arguments passed to exec need conversion
I don't see a reason why we should be mucking around in any way with
usernames and command-line arguments provided by the operating system or
user.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 23 17:56:21 2002