[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Issue 520 in tortoisesvn: TortoiseMerge fails to detect utf-16 without BOM

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Sat, 20 Jul 2013 20:48:34 +0200

On 20.07.2013 20:32, Oto BREZINA wrote:
>> Already there:
>> double click on the status bar at the bottom where the encoding of the
>> file is shown.
> In bottom menu you can change encoding File will be stored, not encoding
> use to load a file. E.g there may be ASCII file which is detected as
> UTF-8 even there are bytes with value over 127. Then you may need to
> reload file with forced format...

Ups, right. That's only for saving.

>>> I had some ideas about this, but as it is started I'll not try to
>>> implement those.
>>> It was based on odd/even positioning of 0 bytes. In addition to new
>>> lines, and spaces.
>> Curious: why newlines and spaces to detect the encoding?
> In utf-16 let say chinise, is not much of 0 bytes, and there may be
> valid values with zero upper byte as well as lower one is zero. New
> lines and spaces are most probable characters, even their counter parts
> (with swapped bytes) are correct too, but really rare (0x2000 - en quad,
> 0x0a00 and 0x0d00- seems be incorrect unicodes )

Interesting.
But I think for now, just counting null chars should be enough. Won't
work for the situations you just mentioned, but for all others it will
work. And it's much better than what we have now which is not detecting
it at all.

> Just note don't forget UTF-32 in detection to be complete at once.

That's a job for maybe 1.9 - right now we only detect those with the
BOM. And to be honest, I've never even had one file that was encoded
like that, so even if we rely on the BOM there it won't affect many
people if we don't detect such files that don't have a BOM.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest interface to (Sub)version control
    /_/   \_\     http://tortoisesvn.net
------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=757&dsMessageId=3060928
To unsubscribe from this discussion, e-mail: [dev-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2013-07-20 20:48:45 CEST

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.