[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problem with UTF-8-files and creating and appliying patches

From: Felix Saphir <felix.saphir_at_presswatch.de>
Date: Wed, 17 Feb 2010 15:32:42 +0100

Gert Kello schrieb:
>> While you might be correct about TortoiseMerge and BOM, UTF-8 has a
>> defined byte-order, so there is no need for a BOM (see
>> <http://www.unicode.org/faq/utf_bom.html#bom5>).
>
> Well, actually there is. From the same page,
>
> Some protocols allow optional BOMs in the case of untagged text. In those
> cases,
> - Where a text data stream is known to be plain text, but of unknown
> encoding, BOM can be used as a signature. If there is no BOM, the encoding
> could be anything.
>
> That is usually the case of plain-text files, such as program code source
> -> You do not know what should be used as encoding.

Correct, but would you really rely on the BOM to detect the encoding?
What if I used an editor unaware of Unicode (and there are plenty) to
insert a byte sequence, that has no meaning in UTF-8? You (or your
program) can detect that sequence only by looking at the contents, the
BOM (whether present or not) does not help you at all.

As the name says, it's a byte-order marker, not an encoding marker.

Felix

-- 
  /^\ | ASCII Ribbon Campaign
  \ / | - no HTML in email and news
   x  | http://www.asciiribbon.org/
  / \ | http://www.gerstbach.at/2004/ascii
------------------------------------------------------
http://tortoisesvn.tigris.org/ds/viewMessage.do?dsForumId=4061&dsMessageId=2448380
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_tortoisesvn.tigris.org].
Received on 2010-02-17 15:32:52 CET

This is an archived mail posted to the TortoiseSVN Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.