[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Cross-Platform Character Encoding Issues

From: Kalin KOZHUHAROV <kalin_at_thinrope.net>
Date: 2005-08-22 18:38:23 CEST

Gustave T. Stresen-Reuter wrote:
> Wow, thanks for the quick reply! That certainly does eliminate the
> possibilities (because I'm pretty certain the Mac respects the internal
> character encoding)... I love JEdit but...

Please do NOT top-post like that...

>> "Gustave T. Stresen-Reuter" <tedmasterweb@mac.com> wrote on 08/22/2005
>> 11:33:06 AM:
>>
>>> We have a multi-platform development environment (windows, mac os x,
>>> linux). On Windows and Linux we use JEdit as the editor and on the Mac
>>> we use BBEdit.
>>>
>>> We're finding that we have a couple of character encoding issues and
>>> don't know how to solve them.
The logical way is to design a simple test case, make sure it is working
(that is breaking things) and change some parameters until you find a
fix/workaround.
Check the files with other than the used editors for valid UTF-8.
On linux, try:
cat file| iconv -f UTF-8 -t UTF-16
If that succeds, you have UTF-8 correct.

>>> Specifically, documents checked into Subversion from the Mac (encoded
>>> in utf-8 No BOM) and then checked out onto Windows end up incorrectly
>>> encoded (accented characters display incorrectly). Likewise, documents
>>> created on the windows machines, checked into Subversion and then
>>> checked out onto the Mac end up with "gremlins" (characters that don't
>>> display but are definitely a part of the document).
>>>
>>> I've read in several places that documents checked into Subversion are
>>> converted to utf-8, but if that were true, why would we end up with
>>> this mis-encoded documents? Is it possible that JEdit or some other
>>> aspect of Windows is messing with the encoding and if so, what could it
>>> possibly be?
>>>
>>> This is somewhat urgent so any help resolving this issue is greatly
>>> appreciated.
>>
>>
>> Subversion does not do anything to the contents of your files. The lone
>> exception being that you can ask Subversion to do stuff with the EOL
>> characters and/or expand specific keywords.

Yes, make sure svn:eol-style is "native". If you don't believe
subversion (then why do you use it :-), make an MD5 sum of the file
before svn add, after svn add, after commit and after checkout on a
different platform. All MD5s should be the same. If not, post here which
are the different ones.

>> The UTF-8 information you are referring to only applies to the file
>> "metadata", such as the filename or any properties. You are on your own
>> when it comes to the encoding. If your file content is UTF-8 you should
>> just need to use tools that properly recognize that encoding.

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Aug 22 18:44:54 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.