[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Unicode byte-order mark (BOM)

From: Adal Chiriliuc <adal_at_myrealbox.com>
Date: 2004-03-06 23:04:40 CET

There is a problem in the binary/text detector from Subversion 1.0.0 (Win32).
The Unicode standard defines a so called byte-order mark. This is usually
placed at the begining of a Unicode plain text file. This marker can
have these representations:

EF BB BF - UTF-8
FE FF - UTF-16/UCS-2, little endian
FF FE - UTF-16/UCS-2, big endian
FF FE 00 00 - UTF-32/UCS-4, little endian
00 00 FE FF - UTF-32/UCS-4, big-endian

When you save a plain text file as Unicode from Notepad (Windows XP)
it adds this mark at the beginning of the file. But then if you add
that file to a Subversion repository, it's marked as
application/octet-stream. If you remove the byte-order mark and add it
again (under a different name, of course), it doesn't mark it as
application/octet-stream.

More info and some ideas on how to determine if a file is Unicode:
http://msdn.microsoft.com/library/en-us/intl/unicode_42jv.asp
http://msdn.microsoft.com/library/en-us/intl/unicode_81np.asp

Adal Chiriliuc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Mar 6 23:06:21 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.