[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Bug with UTF-8 files

From: Igor Paliychuk <mansonigor_at_gmail.com>
Date: Thu, 28 Jul 2011 11:00:48 +0300

I was using notepad++ for creating file. I set utf-8 without BOM, edited
file and saved it. So next time opening that file codepage wasautodetected
as utf-8 without bom. TortoiseSVN Patch viewer also autodetects codepege
normally(all chares are ok).

"not in utf-8 but in ansi with broken non1252-chars"
I mean that after applying that patch when i open created file in notepad++
, it autodetects codepage as default ansi and file has broken chars.
Changing codepage to utf-8(for decoding i mean) doesn't help. And as i can
see chars that are broken are chars, that are not included in 1252. I'll
attach example file later today.

And the last. As i understand problem is in BOM? If it's present text is "in
all cases" interpreted as UTF. But if there is no BOM codepage can be
detected wrongly. Right?

2011/7/28 Ulrich Eckhardt <ulrich.eckhardt_at_dominolaser.com>

> On Wednesday 27 July 2011, you wrote:
> > Hi. I'm using TortoiseSVN 1.6.16, Build 21511 and have next bug:
> >
> > patch with newly created file(s) in utf-8 codepage is applied wrong. Here
> > is the explanation:
> >
> > 1. Create new file in utf-8 (without BOM)
> > 2. Add to it some lines with text in few languages, that have different
> > ansi codapages(eg russian(ansi - 1251) polish(ansi-1250)
> > english(ansi-1252) etc)
> > 3. Create patch using tortoisesvn. At this stage
> > all looks fine- when you'll open patch the codepage will be treated as
> utf
> > and all chars are ok
> > 4. Revert changes to tree(or use another tree) and
> > apply patch. Tortoisesvn will create needed file but it will be not in
> > utf-8 but in ansi with broken non1252-chars.
>
> Just to confirm, did you verify with a hex editor or similar tool that the
> file did contain valid UTF-8 after editing (step 2) and that it didn't
> contain
> valid UTF-8 after applying the patch (step 4)? The point is that without
> the
> BOM some tools will apply heuristics which can and do fail.
>
> What puzzles me is also your explanation. You say the file is "not in utf-8
> but in ansi with broken non1252-chars", what exactly does that mean? If you
> open a file with text encoded in UTF-8 and interpret its contents
> differently,
> like e.g. the current single-byte codepage, of course its content is
> garbled.
>
> That said, it could help if you provided the original file, the file after
> editing and the patch that was generated, of course reduced to a sensible
> amount of data (just a few lines, if possible).
>
> Uli
>
> **************************************************************************************
> Domino Laser GmbH, Fangdieckstra�e 75a, 22547 Hamburg, Deutschland
> Gesch�ftsf�hrer: Thorsten F�cking, Amtsgericht Hamburg HR B62 932
>
> **************************************************************************************
> Visit our website at http://www.dominolaser.com
>
> **************************************************************************************
> Diese E-Mail einschlie�lich s�mtlicher Anh�nge ist nur f�r den Adressaten
> bestimmt und kann vertrauliche Informationen enthalten. Bitte
> benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte
> Empf�nger sein sollten. Die E-Mail ist in diesem Fall zu l�schen und darf
> weder gelesen, weitergeleitet, ver�ffentlicht oder anderweitig benutzt
> werden.
> E-Mails k�nnen durch Dritte gelesen werden und Viren sowie
> nichtautorisierte �nderungen enthalten. Domino Laser GmbH ist f�r diese
> Folgen nicht verantwortlich.
>
> **************************************************************************************
>
> ------------------------------------------------------
>
> http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=4061&dsMessageId=2805174
>
> To unsubscribe from this discussion, e-mail: [
> users-unsubscribe_at_tortoisesvn.tigris.org].
>

------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=4061&dsMessageId=2805189

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2011-07-28 13:30:33 CEST

This is an archived mail posted to the TortoiseSVN Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.