[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion and UTF-16 Files

From: Kalin KOZHUHAROV <kalin_at_thinrope.net>
Date: 2005-11-15 07:18:42 CET

Ryan Schmidt wrote:
> On Nov 14, 2005, at 15:09, Ricardo Grünewald wrote:
>
>> I am a c# developer and have recently started to use Subversion.
>> All my source codes are saved by Visual Studio 2003 in utf-16 format.
>> Subversion treats thes files as binary, not as text, which hinders
>> the file
>> comparison
>> How can I get Subversion to treat my files as "text"
>
>
>
> I don't think Subversion can currently see UTF-16 files as text files.
>
> Someone filed an issue about this in January:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2194
>
> The issue was marked invalid because no discussion had taken place. So
> a discussion was started:
>
> http://svn.haxx.se/users/archive-2005-01/0233.shtml
>
> The result of the discussion was that the issue should be reopened and
> marked as a feature request to support UTF-16 and UTF-32, but nobody
> appears to have done so.
The issue is not reopened, nor can I do that... Who can?

Although we do not use UTF-{16,32} currently, all i18n issues are a big PITA and in the present
state of the Net and IT as a whole are just lame. Working with 5 languages, a few encodings each,
with a handful of tools (some proprietary) on at least two distinctly different OSes (let alone
variants) has always been a problem for the me for the last 10 years or so. Trying to use UTF-8
lately seems to work around most problems.

The simplicity and wide adoption of English and ASCII (or ISO-8859-1) encodings is the result of
laziness on the developer side (yes, I do this also). Currently, most (say 85%) of the OSS is
developed in English and ASCII, though the percentage of i18n-ed OSS and OSS that is based on UTF-8
has quite increased in the last few years.

As you can represent anything in UTF-8, it is well defined and widely used across the Net, it is the
best inter-operable internal encoding for textual data (and somehow space efficient). See this very
informative table:
http://www.unicode.org/faq/utf_bom.html#37

Some more insight from:
        http://en.wikipedia.org/wiki/Unicode
        http://en.wikipedia.org/wiki/Comparison_of_unicode_encodings
and the links from there.

That all being said, I feel that if subversion (or any other software) has proper support for UTF-8,
or even better internal representation of textual data in UTF-8 then interoperability can be insured
by software such as iconv. UTF-8 can represent any language, so we don't need anything else.

The bug (personal classification) that UTF-8 is handled as binary in some (most?) situations should
be a good start.
Supporting tons of encodings can easily be achieved with external libraries (such as iconv).

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Nov 15 07:22:08 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.