[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: bug: Incorrect UTF-16 detection

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: Wed, 7 Oct 2015 20:24:09 +0200

On 07.10.2015 15:45, Sébastien Kirche wrote:
> Hi, I have a program that outputs some small unencoded (= ansi
> encoded) text files. It also incorrectly adds a final null character
> at the end of file.

And that's the problem: this makes the file *not* an ansi encoded file.

> I have noticed that for files smaller than 50 bytes it brakes the
> Unicode detection of TortoiseMerge that displays my small files like
> chinese with wrong shown encoding of utf-16le. If I artificially
> increase the size over 50 bytes the file is shown correctly.
> It seems that the culprit is in src/TortoiseMerge/FileTextLines.cpp
> at lines 122 (and perhaps 153) in a hack that consists in comparing a
> null character count to the file size divided by 50. For files
> smaller than 50 bytes, any number of null characters will incorrectly
> result in an utf-16 display.

Since your file is not properly encoded, the detection is correct.
If we would change the value from 50 to something else, then the
detection of properly encoded files could break. I admit the value of 50
seems somewhat random, but that value is used after testing a lot of
(correctly) encoded files. You see, utf-16 encoded files using Chinese
chars don't have many zero chars in them, because most of those chars
are in the range 255-65535. So using a different value could break the
detection of those files.

So no, this is *not* a bug in TMerge but as you admitted a bug in your file.


   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest interface to (Sub)version control
    /_/   \_\     http://tortoisesvn.net
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2015-10-07 20:24:13 CEST

This is an archived mail posted to the TortoiseSVN Users mailing list.