[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support for Unix with APR?

From: Ryan Schmidt <subversion-2008a_at_ryandesign.com>
Date: Wed, 13 Feb 2008 14:25:41 -0600

On Feb 13, 2008, at 09:14, Erik Huelsmann wrote:

>>>> This is broken. APR should switch to UTF-8 locales internally
>>>> when it
>>>> deals with filenames (like what GNOME apps do). Otherwise this
>>>> leads
>>>> to consistency problems when the user has both ISO-8859-1 and UTF-8
>>>> terminal sessions (the reason is that some applications and/or some
>>>> machines do not support multibyte character sets, and one wouldn't
>>>> want to mess everything when running svn in degraded mode, i.e.
>>>> with
>>>> ISO-8859-1 locales).
>>>
>>> No. The way (non-Mac) unices deal with this is seriously broken.
>>> There
>>> is *no* guarantee the actual input paths are the encoding claimed by
>>> the locale settings.
>>>
>>> There is no way for APR to solve that issue. The only thing it
>>> can do
>>> is tell the application which input it should expect. Subversion
>>> offers conversion routines to do the actual "locale"->UTF8 path
>>> conversion since Subversion actually *is* UTF8 "inside", meaning
>>> that
>>> it's ok for Subversion to err when it encounters invalid (ie non-
>>> UTF8)
>>> input. Not all APR applications may find that desirable (for
>>> example:
>>> Apache httpd doesn't initialise locale settings, so, it can't do
>>> locale->utf8 conversions [as the C runtime doesn't know what the
>>> current locale is]; nor will it change that behaviour.)
>>
>> It's worse. SVN doesn't get it right either since it's ignorant of
>> unicode
>> normalization forms [1].
>
> Well, yes and no :-) Subversion depends (more so than, say, /bin/ls)
> on a sanely configured environment (locale on disk == locale in
> terminal, locale configured in the first place, etc). This is fine,
> since Subversion needs to operate accross different configurations and
> even OSes (whereas /bin/ls does not).

Hold up for a second... I'm havin' a little trouble...

> locale on disk

What is this? I know that on my Mac, I use the HFS+ filesystem which
stores filenames in UTF-16. But that's a character encoding, and it's
not configurable; it's an integral part of the HFS+ specification.
Are you saying there's also an associated locale in the filesystem? I
don't think I've ever been asked to set one, and I don't know how I
would do so nor how I would figure out what it's set to now...

> == locale in terminal,

This is the locale I know about. "LANG=en_US.UTF-8" and so forth.
When I do this, Terminal knows how to display filenames from the disk
correctly because it converts the UTF-16 characters on disk into
UTF-8 characters for display in Terminal. Similarly, various commands
like ls and svn know that I want them to output UTF-8 characters to
the terminal.

> locale configured in the first place

What is this? What is "in the first place"? Is that when I first
checked out a working copy? when I first made a repository? when I
first installed Subversion? when I first installed the OS?

It sounded like Vincent was saying that if a working copy is created
under one terminal locale setting, but then accessed with a different
terminal locale setting, things don't work right.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-13 21:26:28 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.