[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Unicode UTF-16 files detected as binary

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2005-01-05 21:31:53 CET

On Wed, 5 Jan 2005, Max Bowsher wrote:

> Barry Scott wrote:
> > maxb said this is an invalid as an issue and I should read the red text
> > on
> > the issues page. Nothing in the FAQ, No existing issue so I guess I need
> > to mail the details to users.
> >
> > I create three test unicode files on windows using notepad.
> > Encoded as UTF-8, UTF-16 LE and UTF-16 BE.
> >
> > The UTF-8 file was added as text. But both UTF-16 files are
> > treated as binary.
> >
> >> svn add utf8.txt utf16-be.txt utf16-le.txt
> > A utf8.txt
> > svn: File 'utf16-be.txt' has binary mime type property
> >
> > I would guess that utf32 files are also treated as binary.
>
> Indeed, it is invalid as an issue at this stage, because it requires
> discussion.
>
That's true. Note (to Berry) that sometimes we reopen issues after
discussion and rephrase them somewhat.

> For example, I would that UTF-{16,32} are effectively binary files, in many
> ways.

This can be said about UTF8 as well, except for the ASCII subset.
> They can't be diffed, unless you teach the diff program what a lineend is in
> the new format, and they can't be displayed on most terminals, nor easily
> shown in email.

True. ONe way to handle this would be to convert them to UTF8 for internal
processing, so it would be possible to support without very much work I
think. But let's keep ourselves out of implementation details for now. I
wouldn't have time to work on this currently anyway.

> They require special editors/viewers, just like MSWord docs require special
> editors.
>
This is to stretch it too far. UTF16 is plain text in one encoding. It
isn't ASCII-compatible, but that's another matter. This is very different
from a proprietary word processor format.

> I think if svn is going to start treating UTF-16 as text, it at least needs
> to be taught to diff it properly.
>
Yes this is true. I think the issue should be reopened and reprhrased to
something like "Support Unicode encodings other than UTF8 as plain text"
and marked as an unscheduled enhancement.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Jan 5 21:46:12 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.