[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Newlines, preserving data, and multiple access paths

From: Colin Putney <cputney_at_whistler.com>
Date: 2001-12-14 23:43:54 CET

William Uther wrote:

> --On Friday, 14 December 2001 1:16 PM -0500 Greg Hudson
> <ghudson@MIT.EDU> wrote:
>
>> If newline-style is LF, CR, or CRLF, translate <native newline style>
>> -> <requested newline style>. If we notice any CRs or LFs which aren't
>> part of a native-style newline and aren't part of a requested-style
>> newline, abort the commit. If the commit succeeds, apply the <native
>> newline style> -> <requested newline style> translation to the working
>> copy as well, so that it matches what we would get from a checkout of
>> the new rev.
>
> I don't think this preserves reversability. If a file contains BOTH
> <native-style newline> and <requested-style newline> then you neet to
> abort. If you translate just <native-style newline> then you can't
> undo the transformation - you don't know which newlines need to be
> untransformed.
>
> Stated simply: You should only translate when the newline style is
> entirely consistent. Anything else removes the inconsistency and hence
> loses information.

True, this scheme doesn't preserve reversibility. But in this case
that's OK, because the newline-style decrees what the newline style must
be. If there are native-style newlines mixed in with the requested-style
newlines, this is probably the result of corruption by some
native-newline-obsessive user tool. So the non-reversible transform will
actually undo the corruption.

For example, the file foo.dsp, which has newline-style of CRLF. It's
stored in the repository with CRLF newlines and on checkout, no
transformation is done. If Linus checks out the file and edits it in an
old version of emacs, any lines he adds will be terminated with a bare
LF. Since this is his native style of newline, the transformation Greg
described will undo this damage.

If the newline-style is set to a specific newline-style (ie. CR, LF, or
CRLF), then we know that (1) the file is text, not binary, and (2), any
other style of newline present is corruption.

A file should not be marked with a specific newline style unless (1)
user does so explicitly, or (2) it matches some heuristic when it's
added, *and* the file contents conform to that newline style.

So the only real possibility for corruption is if some user tool creates
a binary file that matches a heuristic for a specific newline style. In
our running example, William creates a vector graphics file called
foo.dsp and adds it. By chance, this file happens to have CRLFs
scattered though it, but no bare CRs, LFs, '\0' characters or other
harbingers of binary files. On the commit, svn will notice the
extension, set the newline-style to CRLF and send it to the repository.
William may get an error if he tries to commit a change that introduces
a bare CR or LF, but he won't corrupt the file.

Linus can corrupt the file if he makes a change that introduces a bare
LF, which will get transformed into CRLF on commit. Alternatively,
Madeleine (was that her name?) Can introduce a bare CR and commit, which
will also corrupt the file.

That's a pretty long string of unlikely coincidences though, while the
opposite case, where this transformation *fixes* corruption, is quite
common.

Colin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:53 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.