[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Cross-Platform Character Encoding Issues

From: Gustave T. Stresen-Reuter <tedmasterweb_at_mac.com>
Date: 2005-08-22 20:44:33 CEST

On Aug 22, 2005, at 5:38 PM, Kalin KOZHUHAROV wrote:

> Gustave T. Stresen-Reuter wrote:
>> Wow, thanks for the quick reply! That certainly does eliminate the
>> possibilities (because I'm pretty certain the Mac respects the
>> internal character encoding)... I love JEdit but...
> Please do NOT top-post like that...

Sorry, been a while since I've been corrected on my netiquette!
>>> "Gustave T. Stresen-Reuter" <tedmasterweb@mac.com> wrote on
>>> 08/22/2005
>>> 11:33:06 AM:
>>>> We have a multi-platform development environment (windows, mac os x,
>>>> linux). On Windows and Linux we use JEdit as the editor and on the
>>>> Mac
>>>> we use BBEdit.
>>>> We're finding that we have a couple of character encoding issues and
>>>> don't know how to solve them.
> The logical way is to design a simple test case, make sure it is
> working (that is breaking things) and change some parameters until you
> find a fix/workaround.
> Check the files with other than the used editors for valid UTF-8.
> On linux, try:
> cat file| iconv -f UTF-8 -t UTF-16
> If that succeds, you have UTF-8 correct.

Thanks for the tip. Here's the output when run against the window file:
iconv: illegal input sequence at position 4272

Here's the output when run against the Mac version: (no output)

This leads me to believe that the problem is on Windows. Just to make
sure, I ran the same test against a version of the file on Linux and
the test passed as well, so I'm pretty convinced the problem is somehow
related to the Windows version. I'm using TortoiseSVN and will also
test Tortoise against a command line version of Subversion just to make
sure this isn't a side effect of using Tortoise.
>>>> Specifically, documents checked into Subversion from the Mac
>>>> (encoded
>>>> in utf-8 No BOM) and then checked out onto Windows end up
>>>> incorrectly
>>>> encoded (accented characters display incorrectly). Likewise,
>>>> documents
>>>> created on the windows machines, checked into Subversion and then
>>>> checked out onto the Mac end up with "gremlins" (characters that
>>>> don't
>>>> display but are definitely a part of the document).
>>>> I've read in several places that documents checked into Subversion
>>>> are
>>>> converted to utf-8, but if that were true, why would we end up with
>>>> this mis-encoded documents? Is it possible that JEdit or some other
>>>> aspect of Windows is messing with the encoding and if so, what
>>>> could it
>>>> possibly be?
>>>> This is somewhat urgent so any help resolving this issue is greatly
>>>> appreciated.
>>> Subversion does not do anything to the contents of your files. The
>>> lone
>>> exception being that you can ask Subversion to do stuff with the EOL
>>> characters and/or expand specific keywords.
> Yes, make sure svn:eol-style is "native". If you don't believe
> subversion (then why do you use it :-), make an MD5 sum of the file
> before svn add, after svn add, after commit and after checkout on a
> different platform. All MD5s should be the same. If not, post here
> which are the different ones.

Mac: MD5 (index.html) = 45f0f1bc8e0571a025e225dea7f7c353
Lin: MD5 (index-lin.html) = 45f0f1bc8e0571a025e225dea7f7c353
Win: MD5 (index-win.html) = 5384838ac87ceae69925eae16d9e6a7d

I'm leaning toward a Tortoise issue... thanks again for the input!


To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Aug 22 20:46:48 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.