Ulrich Drepper <drepper@redhat.com> writes:
> I suggest one additional test before running the non_ascii test for the
> entire string. Check whether the encoding used is known to be
> ASCII-safe. Only if this test succeeds should the non_ascii tests be
> performed.
Fair warning: I am blatantly trying to manipulate you into doing work.
You are free to refuse :-).
> The checks for ASCII-safeness can be performed by string comparisons
> with the name of the encoding of the incoming data. The names of all
> the safe encodings could be collected. Variations in names can and
> probably should be eliminated by normalization before the comparison.
Okay, makes sense to me.
> An encoding is ASCII-safe if
>
> From it's initial state it is not possible to create a character
> which does not have the ASCII encoding when only using ASCII input
> bytes.
This I don't quite understand.
I understand that there are many encodings that meet this criterion,
including most (all?) of the stateful encodings. But just because the
encoding is "ASCII-safe" doesn't mean that the input ASCII always
bears any meaningful relationship to the output ASCII.
The definition I expected was something like:
An encoding E is ASCII-safe if it is more or less a superset of
7-bit ASCII, in which the 7-bit codes mean the same thing in E as
they do in ASCII. For example, ISO-8859-1 and UTF-8 are both
ASCII-safe.
The reason I expected this definition is that it means we effectively
*get* UTF-8 by simply accepting the characters from the local
encoding. But if we hit an eighth-bit character, then we know the
game is over (the current implementation also filters out a lot of the
control characters, just in case, but that's a detail).
So our task is to compose a list of the encodings that meet this
criterion, and in the non-conversion case in utf.c, make sure that the
client is using one of those encodings before accepting the data.
?
-K
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jul 21 07:18:12 2002