[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: use of UTF-8

From: William Uther <will+_at_cs.cmu.edu>
Date: 2002-05-31 16:54:16 CEST

On 30/5/02 6:45 PM, "Greg Stein" <gstein@lyra.org> wrote:

> On Thu, May 30, 2002 at 05:06:03PM -0500, Karl Fogel wrote:

>> I see three options on the table:
>
> Four.

Five? Or Six? Seven anyone?

     - Add a repository wide property that gives the charset for log
messages. It could vary from 'local' to 'UTF-8' to ...

     - Add a local configuration option that switches the translation on or
off.

     - Interpret not having a locale set as 'use no translation'. If you
set a locale, then svn will use it. If you want to use random charsets,
then don't lie in your locale settings.

> - add a second parameter to the relevant data structures and routines
> to hold the character set of the string in question (while we're
> talking about log message here, I think there are others; the rule
> for log msgs will apply everywhere)
>
>> - Keep them as char *, declare them UTF-8, and convert user input
>> as best we can.
>>
>> - Keep them as char *, declare no particular charset, but don't
>> allow zero bytes.
>>
>> - Convert them back to counted-length strings and treat them as
>> binary data again (I guess this is the most militantly charset
>> neutral option).
>
> Of the above [seven] approaches:

 1) Would require the implementation of global properties in the repository.
In 'no property' was interpreted correctly then this could be implemented
post-1.0 and still be backwards compatible.

 2) Config options are not always a good idea. Having a local config option
is worse as it removes any guarantees about log messages in the repos.

 3) This mostly the same as option 2.

> [4]) a second param is very heavyweight from a conceptual and coding
> standpoint. and, in the end, we'll probably have to do conversions
> anyways, so allowing an arbitrary charset rather than fixed doesn't
> seem to buy a lot.
>
> [5]) my favoriate. note that the *client* does the conversions. the libraries
> simply assume all text strings are in UTF-8.
>
> [6]) untenable for the clients.
>
> [7]) this is similar to ([6]), but we just allow more flexibility.

There seems to be a consensus forming for translation. Might I suggest that
people keep option 1 in mind so that if repos properties are implemented at
some later stage they could be used.

Later,

\x/ill :-}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:11:58 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.