Re: utf-8 sanity check.

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-07-11 11:36:33 CEST

Ulrich Drepper <drepper@redhat.com> writes:

> All parameters must be interpreted according to the locale of the
> shell. This gets overwritten by the use of env. So all parameters must
> use Latin1. If you'd want some option which overwrites the locale for
> subsequent characters you might get into trouble. E.g., in
>
> svn --locale=en_US.IBM273 foo bar
>
> the <SPACE> you see in the mail is actually a <U0080> in IBM272 which
> might be no separating character in the locale and therefore svn might
> be called with just one parameter (assuming that neither "foo" nor "bar"
> are byte representations for a white-space character in IBM273, I
> haven't checked it). You should get the idea.

I'm not sure ASCII-incompatible locales (such as EBCDIC in this case)
would work anyway, since there is bound to be code that assumes that
plain ASCII strings can be used without conversion.

Besides, if you do

env LC_CTYPE=en_US.IBM273 svn foo bar

the shell will parse foo and bar into the argument list of env
_before_ the locale is changed. env will not reparse the list, only
shift away the LC_CTYPE argument. Thus svn will be called with two
parameters anyway.

So I don't think this line of reasoning is valid. Of course, the
policy you're suggesting might be as good as any. The thing to
remember with it is this:

Let's say I have a file called `räksmörgås' in the repository. If I
give the option --locale=sv_SE.ISO646-SE to svn, it means that
filenames in the wc should be encoded using this character set, so the
local filename would be `r{ksm|rg}s' if interpreted as US-ASCII.
Let's say that the shell locale is sv_SE.ISO8859-1. Now the correct
way to update the file would then be

svn --locale=sv_SE.ISO646-SE update räksmörgås

as you would expect. (Without the --locale argument, the command
would not work since the file in the wc would not have the expected
filename.) The part that might be unexpected is that tab-completion
will not work, since that would give

svn --locale=sv_SE.ISO646-SE update r\{ksm\|rg}s

and that will fail since there is no file named `r{ksm|rg}s' (that
name can't even be represented in ISO646-SE, so you'll get a recoding
error).

Naturally, there's bound to be some strangeness if the shell locale
and --locale does not match. And not being able to use tab-completion
is probably a minimal breakage. So I'm leaning towards this being a
rather good approach. But then again, I never understood the use-case
in which you want to use --locale instead of having the shell locale
set proplery...

(Implementation-wise, doing it like this requires some changes to the
code in addition to the cache invalidation function, due to the fact
that translation of the arguments which are not -options is currently
deferred to the call of parse_{num/all}_args or args_to_target_array,
and we'd need to translate all args before changing locale. Nothing
major though.)

// Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 11 11:43:51 2002

This message: [ Message body ]
Next message: Mats Nilsson: "RE: [PATCH] Allow revert --recursive to have implict dot-target"
Previous message: Justin Erenkrantz: "[PATCH] Add entries caching"
In reply to: Ulrich Drepper: "Re: utf-8 sanity check."
Next in thread: Ulrich Drepper: "Re: utf-8 sanity check."
Reply: Ulrich Drepper: "Re: utf-8 sanity check."

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]