[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Character Encoding

From: Kevin Grover <kevin_at_kevingrover.net>
Date: Wed, 25 Jun 2008 14:59:55 -0700

On Wed, Jun 25, 2008 at 1:19 PM, <boliver_at_lvlomas.com> wrote:
>
>
> I have a file that I store in Subversion. The file is plain text but does
> contain some French character text which needs to be encoded properly with
> UTF-8. These characters are encoding properly in the Subversion system
> because when I view the file through the Tortoise client I see the correct
> characters. But when I update this file on another machine, the character
> encoding isn't correct and the French text comes out garbled.
>
> I am doing the update on an AIX machine into a working directory.
>
> Is there a way I can tell the Subversion client on the AIX box about the
> encoding???
>
> Thanks.
>
>
> ----------------------------------------------------------------------------
>
>
>
> CONFIDENTIALITY: The information in this message is legally privileged and
> confidential. In the event of a transmission error and if you are not the
> individual or entity mentioned above, you are hereby advised that any use,
> copying or reproduction of this document is strictly forbidden. Please
> advise us of this error and destroy this message.
>
>
> CONFIDENTIALITÉ: L'information apparaissant dans ce message électronique
> est de nature légalement privilégiée et confidentielle. Si ce message vous
> est parvenu par erreur et que vous n'êtes pas le destinataire visé, vous
> êtes par les présentes avisé que tout usage, copie ou distribution de ce
> message est strictement interdit. Vous êtes donc prié de nous informer
> immédiatement de cette erreur et de détruire ce message.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
> For additional commands, e-mail: users-help_at_subversion.tigris.org
>
>

The only changes made to text files are line endings (if you set
svn:eol-style). No transcoding happens.

Subversion uses encodings for filenames (and properties and messages
it prints out). The contents of the file are _not_ part of the deal.
You need to make sure you edit the file correctly. There are many
editors for Windows that can edit files in UTF8 or UTF16 (notepad can,
I believe). You need to make sure you use an editor that does proper
encoding on both machines. Windows uses cp1252 for a default
encoding. I have no idea what AIX uses. If it's a relatively modern
version, it probably uses utf8. (You can probably look at the value
of the LANG env var, or the output of the locale command).

If you use TSVN, how are you looking at the contents of the file? I
didn't think TSVN had a viewer application?

Some caveats: some encodings (UTF-16 and UTF-8 WITH BOM (Byte Order
Mark)) embed magic at the begging of the file so that readers can
figure out what the encoding is. Most other (plain text files for
example) have no indication. You (as the user) must know where the
file was created and where it will be used.

Some editors (Emacs) and languages (Python) look for special markup in
the file (-*- coding: utf-8 -*-) or (-*- coding: latin-1 -*-) and will
use the specified encoding.

XML files defaults to UTF8 if not specified otherwise --- but most
text-only editors don't know this and don't do the right thing: when
you insert high-bit characters (ord(x)>127), the just insert the raw
character code for the default encoding of the system.

Because of the above, even if you have a properly encoded file, you
may see garbage when viewing/editing it with a program that is
un-aware of the encoding used by the file (it tries to use it's own
system default encoding).

Hope this helps some. Good luck.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-06-26 00:00:19 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.