[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: character set handling (was: Re: [proposal] --targets command line option)

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-03-01 04:24:25 CET

On Fri, Mar 01, 2002 at 02:37:00AM +0100, Marcus Comstedt wrote:
> Greg Stein <gstein@lyra.org> writes:
> > Also note that (today) we don't do the conversion on input, which
> > gives us problems with i18n users. When they use a character with the
> > 8th bit turned on, then things bust cuz it is fed into a UTF-8
> > character parser (which promptly declares it an illegal UTF-8 char).
>
> Conversion on output is of couse just as important as conversion on
> input. If I commit the file 'r�ksm�rg�s' to the repository, I expect
> it to still be called 'r�ksm�rg�s' when I check it out again, and not
> some UTF-8 mumbo jumbo.

That would be the intent, yes.

> Especially troublesome is the case where the octet sequence of a
> filename acually _does_ make up a legal UTF-8 sequence, because that
> would allow me to commit the file to the repository, but when output
> conversion is implemented, the file changes name!

That will be a problem. Anybody with existing repositories may have issues
when we turn on the conversions. There isn't anything we'll be able to do
about that. There are just way too many points in the system that would need
a flag to say "don't unconvert the name!"

I seem to recall at one point (like well over a year ago), Jim talking about
wanting to add some code to validate an incoming FS path as legal UTF-8. Of
course, that still wouldn't have solved the above problem.

> To avoid such disasters, it is IMO imperative that input/output
> conversion is implemented before 1.0.

That's the intent. Especially given that we ship stuff via XML. We *have* to
do something here.

I just worry about all the possible character sets and their conversion
to/from UTF-8. That can be a *LOT* of code.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:10 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.