[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Encoding problems

From: Tim Armes <tarmes_at_fr.imaje.com>
Date: 2004-02-19 09:52:22 CET

I'm stumped. Perhaps someone from this list can help me out with this
problem:

Subversion keeps all the log messages in UTF-8 format. This is very
sensible. When you call a console command, the output is converted to the
console's charset. This makes sense too.

Now, on my Windows server, if I call svnlook using PHP's popen command, the
string I get back is, correctly, in CP850. If I want this to be display
correctly on the web page I use iconv to convert it to ISO-8859-1. This
works.

However, on another user's machine (to which I don't have access,
unfortunately) which is a Unix box with the locale in Icelandic, his log
messages are being displayed like this:

T?\195?\179k ?\195?\186t if statement ?\195?\186r index.php sem hvort
e?\195?\176 er gerast aldrei, en ef ?\195?\190?\195?\166r
skyldu gerast er $ID n?\195?\186 h?\195?\182ndla?\195?\176 ?\195?\161 sama
h?\195?\161tt og $LANG, ?\195?\190.e.a.s. ?\195?\190a?\195?\176 er sett
?\195?\161
default ef userinn gefur ?\195?\190a?\195?\176 ekki.

It's the UTF-8 encoding, except that the characters are being converted into
human readable numbers. My first assumption was that svnlook was for some
reason returning the string as UTF-8, and that PHP's print function was
printing the characters as above, expanded. That's not the cas however, my
tests have shown that PHP doesn't do such a thing, it happily prints top-bit
set characters as they are.

The implication then is that it's the svnlook command that's returning the
string exactly as shown above, but I don't believe that either. His locale
setup is in ISO-8859-1, so you would expect that svnlook should return
characters in that encoding.

The sequence is:

Use popen to run svnlook, reading one line at a time with fgets.
Store these lines in an array
Print them.

Does anyone have any idea at what point the UTF-8 encoding could be returned
with the top-set bit characters simply expanded into numeric strings?
Indeed, why should the string be in UTF-8 at all?

Tim
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.F-Secure.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Feb 19 09:52:50 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.