Am 28.07.2011 09:35, schrieb Ulrich Eckhardt:
> On Wednesday 27 July 2011, you wrote:
>> Hi. I'm using TortoiseSVN 1.6.16, Build 21511 and have next bug:
>>
>> patch with newly created file(s) in utf-8 codepage is applied wrong. Here
>> is the explanation:
>>
>> 1. Create new file in utf-8 (without BOM)
>> 2. Add to it some lines with text in few languages, that have different
>> ansi codapages(eg russian(ansi - 1251) polish(ansi-1250)
>> english(ansi-1252) etc)
>> 3. Create patch using tortoisesvn. At this stage
>> all looks fine- when you'll open patch the codepage will be treated as utf
>> and all chars are ok
>> 4. Revert changes to tree(or use another tree) and
>> apply patch. Tortoisesvn will create needed file but it will be not in
>> utf-8 but in ansi with broken non1252-chars.
>
> Just to confirm, did you verify with a hex editor or similar tool that the
> file did contain valid UTF-8 after editing (step 2) and that it didn't contain
> valid UTF-8 after applying the patch (step 4)? The point is that without the
> BOM some tools will apply heuristics which can and do fail.
There is an exact test for UTF-8.
> What puzzles me is also your explanation. You say the file is "not in utf-8
> but in ansi with broken non1252-chars", what exactly does that mean? If you
> open a file with text encoded in UTF-8 and interpret its contents differently,
> like e.g. the current single-byte codepage, of course its content is garbled.
I can confirm this: The patch was correct UTF-8, the file created by the
patch was not. All the "funny" characters were replaced by a question
mark, except for the greek characters: alpha became 'a', beta a 'ß'.
I've checked this in a hex editor.
Felix
------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=4061&dsMessageId=2805192
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2011-07-28 10:03:11 CEST