[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Removing the --enable-utf8 flag

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-07-21 07:05:29 CEST

Ulrich Drepper <drepper@redhat.com> writes:
> I suggest one additional test before running the non_ascii test for the
> entire string. Check whether the encoding used is known to be
> ASCII-safe. Only if this test succeeds should the non_ascii tests be
> performed.

Fair warning: I am blatantly trying to manipulate you into doing work.
You are free to refuse :-).

> The checks for ASCII-safeness can be performed by string comparisons
> with the name of the encoding of the incoming data. The names of all
> the safe encodings could be collected. Variations in names can and
> probably should be eliminated by normalization before the comparison.

Okay, makes sense to me.

> An encoding is ASCII-safe if
> From it's initial state it is not possible to create a character
> which does not have the ASCII encoding when only using ASCII input
> bytes.

This I don't quite understand.

I understand that there are many encodings that meet this criterion,
including most (all?) of the stateful encodings. But just because the
encoding is "ASCII-safe" doesn't mean that the input ASCII always
bears any meaningful relationship to the output ASCII.

The definition I expected was something like:

   An encoding E is ASCII-safe if it is more or less a superset of
   7-bit ASCII, in which the 7-bit codes mean the same thing in E as
   they do in ASCII. For example, ISO-8859-1 and UTF-8 are both

The reason I expected this definition is that it means we effectively
*get* UTF-8 by simply accepting the characters from the local
encoding. But if we hit an eighth-bit character, then we know the
game is over (the current implementation also filters out a lot of the
control characters, just in case, but that's a detail).

So our task is to compose a list of the encodings that meet this
criterion, and in the non-conversion case in utf.c, make sure that the
client is using one of those encodings before accepting the data.



To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jul 21 07:18:12 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.