[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Branko Cibej <brane_at_xbc.nu>
Date: Fri, 13 Nov 2009 10:41:16 +0100

Mike Samuel wrote:
> 2009/11/12 Branko Čibej <brane_at_xbc.nu>:
>
>> The diff contains a mixture of multi-byte and wide-character strings.
>> Depending on whether your UTF-16 is big- or little-endian, it may
>> incorrectly split lines in the middle of a 16-bit code sequence.
>>
>
> I thought BOMs were widely used with UTF-16 for this very reason. Is
> that not the case?
>

I was just describing current behaviour, that's all; not possible
solutions. Like I said elsewhere, the UTF-16/32 issues can be solved
without looking at property contents, because those encodings are
relatively easily detected, thanks to zero-width non-breaking space.
They just haven't been yet.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2417474
Received on 2009-11-13 10:41:33 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.