Re: Classifying files as binary or text

From: Branko Cibej <brane_at_xbc.nu>
Date: Fri, 13 Nov 2009 10:41:16 +0100

Mike Samuel wrote:
> 2009/11/12 Branko Čibej <brane_at_xbc.nu>:
>
>> The diff contains a mixture of multi-byte and wide-character strings.
>> Depending on whether your UTF-16 is big- or little-endian, it may
>> incorrectly split lines in the middle of a 16-bit code sequence.
>>
>
> I thought BOMs were widely used with UTF-16 for this very reason. Is
> that not the case?
>

I was just describing current behaviour, that's all; not possible
solutions. Like I said elsewhere, the UTF-16/32 issues can be solved
without looking at property contents, because those encodings are
relatively easily detected, thanks to zero-width non-breaking space.
They just haven't been yet.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2417474
Received on 2009-11-13 10:41:33 CET

This message: [ Message body ]
Next message: Julian Foad: "Re: [PATCH] Fix a deprecation warning."
Previous message: Philip Martin: "Re: Subversion test results."
In reply to: Mike Samuel: "Re: Classifying files as binary or text"
Next in thread: Branko Cibej: "Re: Classifying files as binary or text"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]