Re: converting unconvertible UTF-8 data

From: Ulrich Drepper <drepper_at_redhat.com>
Date: 2002-07-21 08:36:49 CEST

On Sat, 2002-07-20 at 23:05, Karl Fogel wrote:

> 3) Have a fuzzy conversion function that tries to convert all the
> data, but if that fails, converts every character it can and
> replaces the others with ?\XXX (or some standard sequence) to
> indicate the Unicode value of the failed character.

Preferrable to this is the use of transliteration. You are talking
about a transformation which can lose information anyway. Some iconv()
implementation (glibc's and GNU libiconv's) support transliteration.
Just add //TRANSLIT to the to-charset option string of the iconv_open
call.

The problem with transliteration is, though, that it is locale
dependent. So the result may differ depending on the selected locale.

Just to make myself clear: transliteration here means replacement of
some unconvertable input with something readable in the output format.
I.e., when converting 'ä' (a-umlaut) to ASCII it would be replaced in a
German locale with 'ae'. In a Danish locale it would be only 'a',
though.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

application/pgp-signature attachment: This is a digitally signed message part

Received on Sun Jul 21 08:37:37 2002

This message: [ Message body ]
Next message: Eric Gillespie: "Re: [Issue 802] Changed - No separation between files when svn log on multiple files"
Previous message: Justin Erenkrantz: "Re: converting unconvertible UTF-8 data"
In reply to: Karl Fogel: "converting unconvertible UTF-8 data"
Next in thread: Karl Fogel: "Re: converting unconvertible UTF-8 data"
Reply: Karl Fogel: "Re: converting unconvertible UTF-8 data"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]