[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: eol-style and utf-16

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Tue, 31 Oct 2017 11:55:30 +0000

Stefan Sperling wrote on Tue, 31 Oct 2017 10:11 +0100:
> On Mon, Oct 30, 2017 at 09:12:38PM -0400, Nico Kadel-Garcia wrote:
> > It doesn't do much for otehr UTF difficulties, but it sure avoids the
> > whole inconsistent EOL issues.
>
> In my opinion the problem under discussion has nothing to do with eol-style.
> Rather, it is that UTF-16 must be treated as binary data in SVN.
>
> The property svn:mime-type should be set to 'application/octet-stream'
> on UTF-16 files.

"application/octet-stream; charset=utf-16" should work too. I don't
remember off the top of my head which tools consume the additional
information --- httpd mod_magic perhaps? --- but they exist. (Sorry, I
don't have time to look up the details right now.)

> And setting svn:eol-style on a binary file is obviously
> not a good idea (unfortunately, these features are not mutually exclusive
> but they should be).
>
> Adding UTF-16 support is not impossible but difficult because Subversion
> as a system assumes UTF-8 strings and won't work correctly with strings
> that contain embedded NUL bytes, and there are a lot of entry points
> for text data in the system.

I'm not sure which part of the system is not NUL-safe? UTF-8 text files with
svn:eol-style set and embedded NULs seem to be handled correctly.

I agree that principle it'd be possible to sniff the charset from the
svn:mime-type property and then <handwave>DTRT for UTF-16 files with svn:eol-
style</handwave>. This will happen when someone implements it, aka,
patches welcome.

Cheers,

Daniel
Received on 2017-10-31 12:55:40 CET

This is an archived mail posted to the Subversion Users mailing list.