[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Unicode UTF-16 files detected as binary

From: Joel <rees_at_ddcom.co.jp>
Date: 2005-01-06 02:30:18 CET


(I understand that the subversion developers have time constraints. I'm
not complaining, just offering data points.)

> > For example, I would that UTF-{16,32} are effectively binary files, in
> > many ways.
> > They can't be diffed, unless you teach the diff program what a lineend
> > is in the new format, and they can't be displayed on most terminals,
> > nor easily shown in email.
> > They require special editors/viewers, just like MSWord docs require
> > special editors.
> Unicode is the new ASCII. The editors are already here. I.e. if
> Notepad.exe can handle it you have to set the bar pretty low :)

All of the systems I work with on a regular basis (Mac, Linux, fBSD,
MSWxxx) handle Unicode Japanese, and include a default GUI text editor
that is usable for many of the large character set languages. Many of
the terminal programs can handle Unicode Japanese if you set them up to
do so. (Install the fonts and export LANG=ja_JP.UTF-8 or something like
that.) You can get VIM and emacs set up to handle Unicode, as well, if
you look/ask around. I've used Netbeans with Japanese, so I know that
works, and I understand that Eclipse does well, also.

> > Anyway, that's my opinion.
> >
> > I think if svn is going to start treating UTF-16 as text, it at least
> > needs to be taught to diff it properly.

I'm pretty sure I've seen versions of diff that handle the unicode line
endings correctly in the character subsets I've worked with. I don't
think I have time to dig into the subversion source code, but ask me off
list and I might be able to point towards something useful.

> I think this should be done. Unicode in UTF-16 is no less valid as
> text than any particular 8-bit code page. The fact that old tools
> don't understand it is precisely the problem that needs to be fixed.
> Note that Java source code can actually be supplied to the compiler in
> UTF-16 format (though I have never heard of anyone doing that

I've done that. In fact, I'm doing it now, as are all of our programmers
and our subcontractors' programmers.

) so the
> need to support this isn't as odd as it might appear.

I'll second that, and so will the company I'm working for.

(Again, not complaining, just offering data points.)

> Regards,
> Scott

Joel Rees   <rees@ddcom.co.jp>
digitcom, inc.   $B3t<02q<R%G%8%3%`(B
Kobe, Japan   +81-78-672-8800
** <http://www.ddcom.co.jp> **
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Jan 6 02:33:31 2005

This is an archived mail posted to the Subversion Users mailing list.