[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Encoding problems

From: Tim Armes <tarmes_at_fr.imaje.com>
Date: 2004-02-20 09:24:13 CET

Hi,

I've tried asking this question in the users group, but no one seems able to
help.

My problem is this. Running svnlook to read log messages that contain
accented characters works correctly. When doing svnlook info ... >
filename, the new file created is, as expected, correct. That's to say, the
internal UTF-8 format gets converted to the charset of the local machine.

When using PHP's function to execute the command (redirecting output to a
file) the results can be rather odd, containing text such as:

T?\195?\179k ?\195?\186t if statement ?\195?\186r index.php sem hvort
e?\195?\176 er gerast aldrei, en ef ?\195?\190?\195?\166r
skyldu gerast er $ID n?\195?\186 h?\195?\182ndla?\195?\176 ?\195?\161 sama
h?\195?\161tt og $LANG, ?\195?\190.e.a.s. ?\195?\190a?\195?\176 er sett
?\195?\161
default ef userinn gefur ?\195?\190a?\195?\176 ekki.

It would seem, imo, that svnlook is unable to determine the charset of the
calling environment, so output the text in pure ASCII expanding the UTF-8
codes so as to make them human readable. This is of course a hypothesis,
I'm hoping that someone here can shed more likes on the internals of the
system.

I discovered this problem running on a Windows server, and while playing
around trying to fix it it suddenly disappeared of its own accord and never
returned. This means that I can't do any more testing at my end.

I have since had contact from users on other environments seeing the 195's
everywhere.

Yesterday, Lübbe Onken discovered the problem too. He wrote me this mail:

>I just committed changes to a test project using äöüß and ÄÖÜ in the commit
>message. Now the really bad thing happens: they are displayed correctly by
>WebSVN.
>
>Is it possible that somehow the ?/195 codes got into the database by using
>older clients (TSVN, SVN) whatever? I can't understand why svnlook on the
>console displays them properly, svnlook > file does this as well and
svnlook
>from php fails. Is it possible, that some environment variable for php
>containing the locale is not set properly???

Confusing indeed. At this point he had a test database that worked from all
clients (including WebSVN) and one that worked from all clients except
WebSVN.

I suggested, as a test, that he include a call to setlocale(LC_ALL, "");
inside WebSVN. The thinking being that perhaps PHP wasn't setting up the
calling environment correctly and this this would help. Indeed it did, the
195's went and the accents were displayed.

However, when he removed the line again, the 195's were still gone!

He say's:

> The only thing I did between the mails yesterday was restarting apache
once,
> because I wedged a repository (not the one in question). Maybe apache had
> it's locale wrong???

I'm completely stumped, and my inability to reproduce the problem means that
I can't perform further tests. Does anyone here have any explanations?

Regards,

Tim

###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.F-Secure.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 20 09:24:33 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.