[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[TSVN] Re: New info on hooks in docs

From: Norbert Unterberg <nepo_at_gmx.net>
Date: 2004-11-19 20:51:45 CET

SteveKing schrieb:

> Since a text file without BOM's can't be shown correctly at all
> (without doing some guessing about the encoding) there's a standard
> which requires text files to have BOM's in it - it they don't, then
> that means they're not UNICODE (UTF-16, UTF-8, ...) but raw ASCII with
> a codepage (now guess the required codepage to show the file...).

For what I know about UNICODE, this is not entirely true.
See:

#1: unicode.org FAQ
http://www.unicode.org/faq/utf_bom.html#28
"Q: How I should deal with BOMs?

#2: rfc2376
http://www.faqs.org/rfcs/rfc2376.html
Section 5

#3 XML 1.0 W3C Recommendation
http://www.w3.org/TR/REC-xml/
Section 4.3.3

If I understand all this corectly, then the BOM on UTF-8 XML files is
(according to UNICODE and XML specs) optional, some even say that UTF-8
XML file SHOULD not have the BOM. Only UTF-16 XML entities MUST have a
BOM. The encoding is determined by the encoding declaration
(encoding="utf-8"). Even if the encoding declaration is missing, XML
parsers must (should?) assume UTF-8 if no BOM is present.
So XML editors which delete the BOM when saving UTF-8 XML files still
produce valid XML files.

On the other hand, I agree that a BOM for UTF-8 files makes much sense
on the windows platform. But what standard *requires* a BOM on all
UNICODE text files?

Norbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org
Received on Fri Nov 19 21:03:08 2004

This is an archived mail posted to the TortoiseSVN Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.