[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: eol-style and utf-16

From: Stefan Sperling <stsp_at_apache.org>
Date: Tue, 31 Oct 2017 10:11:19 +0100

On Mon, Oct 30, 2017 at 09:12:38PM -0400, Nico Kadel-Garcia wrote:
> On Mon, Oct 30, 2017 at 4:57 PM, engelbert gruber
> <engelbert.gruber_at_gmail.com> wrote:
> > hi
> >
> > checking in a file with eol-style native on unix : eol = 0x0a
> > checking it out on windows : 0x0a is replaced by 0x0d 0x0a
> >
> > when the file is in utf-16 : eol ist 0x00 0x0a
> > and when checked out on windows this becomes : 0x00 0x0d 0x0a
> >
> > which breaks utf-16 as far as i understand it
> >
> > possible fixes:
> >
> > * get utf-## aware
> > * add a charsize property
> > * document it
> > * recommend eol-style a nonnative eol-style: LF CR or CRLF
> >
> > all the best
> > e
>
>
> So, easy solution. *Never* use eol-style.

I would not point at svn:eol-style as the root cause here.
This feature works fine with text files.

> It's destructive to any
> working copy that may be accessed via operating systems with distinct
> eol styles.

It works fine unless the operating system is so obscure that is uses
something other than LF, CRLF, or CR as a newline character.

> And its destructiveness is insidious when files are
> edited, locally, with editor that auto-interpret EOL on the fly,
> leading to inconsistent EOL and EOL confusion when creating new files
> in the repo.

If an editor decides to change all the newlines, this creates
a diff where every line in a text file appears as changed,
even if just a single line was modified by the editor's user.
That's a problem svn:eol-style can solve.

If an editor decides to create inconsistent newlines, it has broken
the file. All you can do now is treat is as a binary file because
text content cannot be split into lines anymore. I would put the
blame on the editor here.

> It doesn't do much for otehr UTF difficulties, but it sure avoids the
> whole inconsistent EOL issues.

In my opinion the problem under discussion has nothing to do with eol-style.
Rather, it is that UTF-16 must be treated as binary data in SVN.

The property svn:mime-type should be set to 'application/octet-stream'
on UTF-16 files. And setting svn:eol-style on a binary file is obviously
not a good idea (unfortunately, these features are not mutually exclusive
but they should be).

Adding UTF-16 support is not impossible but difficult because Subversion
as a system assumes UTF-8 strings and won't work correctly with strings
that contain embedded NUL bytes, and there are a lot of entry points
for text data in the system.
Received on 2017-10-31 10:11:47 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.