[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: TMerge and encodings

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: 2006-08-26 09:59:25 CEST

Sven Brueggemann wrote:

> how does TMerge decide which encoding a file is in?
>
> I have two UTF-8 files (both without BOM) - the left one
> is displayed correctly, the right one in ASCII (double
> byte characters as two characters).
>
> When I create a patch of my changes, revert the file,
> and re-apply the patch, the right file is totally screwed
> up - all the lines that where displayed differently are
> missing.
>
> I tried several diff tools and did a hex diff, but I didn't
> find anything that could cause that problem. It might help,
> if I knew how TMerge discovers a file's encoding.

* BOMs have priority. If a BOM is present, the encoding set by the BOM
is used.
* if no BOM is present, then TMerge scans the file for invalid utf8
sequences. If such an invalid sequence is found, ASCII encoding is used
* if no BOM is present and no invalid utf8 sequence is found, the utf8
encoding is used.

In case you're wondering what an invalid utf8 sequence is:
http://en.wikipedia.org/wiki/Utf8

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Sat Aug 26 09:59:40 2006

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.