[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: converting unconvertible UTF-8 data

From: Ulrich Drepper <drepper_at_redhat.com>
Date: 2002-07-21 08:36:49 CEST

On Sat, 2002-07-20 at 23:05, Karl Fogel wrote:

> 3) Have a fuzzy conversion function that tries to convert all the
> data, but if that fails, converts every character it can and
> replaces the others with ?\XXX (or some standard sequence) to
> indicate the Unicode value of the failed character.

Preferrable to this is the use of transliteration. You are talking
about a transformation which can lose information anyway. Some iconv()
implementation (glibc's and GNU libiconv's) support transliteration.
Just add //TRANSLIT to the to-charset option string of the iconv_open
call.

The problem with transliteration is, though, that it is locale
dependent. So the result may differ depending on the selected locale.

Just to make myself clear: transliteration here means replacement of
some unconvertable input with something readable in the output format.
I.e., when converting 'รค' (a-umlaut) to ASCII it would be replaced in a
German locale with 'ae'. In a Danish locale it would be only 'a',
though.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Received on Sun Jul 21 08:37:37 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.