[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Yet another line-end proposal (YALEP?)

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2001-12-14 04:52:59 CET

Greg Hudson <ghudson@mit.edu> writes:

> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.

This is the property I have been trying to use.

Background (feel free to skip this para it explains why I wan't this,
not how it works) I worked with ClearCase on a C++ project in a mixed
Unix/NT environment earlier this year. ClearCase does line-ending
conversion on a per-view basis for text files (a view is a working-
copy in subversion terms). We used views that were mounted on both
Unix and NT boxes, i.e. one view was simultaneously mounted on both
machines. This made cross platform development much easier as changes
could be built on both platforms without requiring them to be either
checked-in, or manually copied between views. However it caused real
problems if line-ending conversion was enabled. A view set up to use
NT line-endings say was hard to use from a Unix box, since every line
has already changed. Merging, which is normally a ClearCase strong
point, was disrupted if a set of line-end changes got eroneously
committed. We ended up abandoning the line-end conversion, and using
pre-commit triggers to produce the line-endings we wanted (.ds[pw]'s
had CRLF, source code plain LF). This initially disposes me to
support no line-end conversion, and for the default to be off if it is
present. However given that lots of people want it, take a deep breath
and here goes.

Proposal
========

 Rules:

 - The text-base always duplicates the repository.

 - Any sort of line-ending can appear in any file in the repository.

 - There is a native-line-end property that can be set on a file. I am
   not sure if this is a separate property from the text/binary thing
   as I am not sure what the text/binary thing does at present!

 Rules when native-line-end is not set:

 - If the property is not set no line-end conversion occurs. The
   working-copy duplicates the repository. File get commited exactly
   as they appear in the working-copy, just your straight binary file.

 Rules when native-line-end is set:

 - At check-out/update/revert convert all line-endings in the working
   copy to whatever the platform requires. Store the platform line-end
   property in the .svn/entries file (or wherever) to allow checkout
   with Unix client and check-in with non-Unix client or vice
   versa. The .svn/entries property is "none" or "LF" or "CRLF" etc.,
   i.e. an explict line-ending and not just "native".

 - At check-out/update/revert there is a -no-convert option to disable
   line-end conversion, overriding the native-line-end property. This
   also changes the line-end property in the .svn/entries file.

 - At commit check the .svn/entries file to determine the
   line-end property. When generating the diff between the
   working-copy and the text-base if a line-end difference is
   explained by the line-ending conversion ignore it. If the
   introduced line-endings are incompatible with the .svn/entries
   line-end property display an error.

 Diff Algorithm:

 The diff algorithm is basically as follows: do the line-ending
 conversion specified in the .svn/entries file on the text-base to
 generate the pristine working-copy. Diff the pristine working-copy
 and the actual working copy. Within the diff, undo the line-ending
 conversion on the diff for those parts that represent the
 text-base. Within the diff, verify that all line-endings on for those
 parts that represent the working-copy are consistent with the
 .svn/entries property. This diff is now suitable to send to the
 repository.

Advantages
==========

 - On the wire and repository diff's are small.
 - The working copy file gets commited exactly and does not change.[1]
 - Any working-copy file that gets comitted can always be retrieved
   exactly.
 - If an erroneously converted working-copy gets commited the
   corruption does not in general get back into the repository.

[1] Any automatic conversion system has to allow the conversion
enabling property to be unset. When this property change is commited
the working copy needs to be changed to match the repository. This
applies whatever scheme we use. Perhaps it should occur when the user
does the propset rather than waiting until the commit?

Disadvantages
=============

 - More complicated diff algorithm, I'm not even sure the vdelta
   algorithm can be made to operate this way.
 - Something I haven't thought of...

Examples
========

 Scenario 1: text file with native-line-end property
 ----------

 check-out: text-base CRLF working-copy
               abc\n abc\r\n
               def\n def\r\n
               ghi\n ghi\r\n

 edit: text-base CRLF working-copy
               abc\n abc\r\n
               def\n XXX\r\n
               ghi\n ghi\r\n

 diff: CRLF working-copy
                           -def\n
                           +XXX\r\n

 commit: text-base CRLF working-copy
               abc\n abc\r\n
               XXX\r\n XXX\r\n
               ghi\n ghi\r\n

 Note that the working-copy does not need to change at commit, and
 remains what would appear if the user checked-out on this platform.

 check-out: text-base LF working-copy
               abc\n abc\n
               XXX\r\n XXX\n
               ghi\n ghi\n

 edit: text-base LF working-copy
               abc\n abc\n
               XXX\r\n YYY\n
               ghi\n ghi\n

 diff: LF working-copy
                           -XXX\r\n
                           +YYY\n

 commit: text-base LF working-copy
               abc\n abc\n
               YYY\n YYY\n
               ghi\n ghi\n

 Note that once again the working-copy does not need to change at
 commit.

 Scenario 2: binary file with erroneous native-line-end property
 ----------

 add: text-base LF working-copy

 The .svn/entries line-end indicates LF the platform native.

 edit: text-base LF working-copy
                            some\n
                            binary\r\n
                            data

 diff: LF working-copy
                           +some\n
                           +binary\r\n
                           +data

 Note that the diff contains line-end changes that are incompatible
 with the native-line-end property. This might trigger the error, or
 it may be delayed until the commit. The commit fails unless the user
 removes the native-line-end property

 commit: text-base LF working-copy
               some\n some\n
               binary\r\n binary\r\n
               data data

 Note that this can only be commited without line-end conversion.

 Scenario 3: binary file with erroneous native-line-end property
 ----------

 add: text-base LF working-copy

 edit: text-base LF working-copy
                            more\n
                            binary\n
                            stuff

 commit: text-base LF working-copy
               more\n more\n
               binary\n binary\n
               stuff stuff

 Here the binary does not have a conflicting line-ending, so the
 commit succeeds.

 check-out: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuff

 Here the working-copy is corrupt. If the user recognises this the
 native-line-end property can be changed and commited. This, as in any
 other scheme, has to update the working-copy. Then the user has the
 correct binary file. If the user does not have commit access, they
 can use the -no-convert option to get a valid working-copy.

 check-out: text-base CRLF working-copy
 -no-convert more\n more\n
               binary\n binary\n
               stuff stuff

 If the corruption is unnoticed, and the user continues, the amount of
 corruption in the repository is "stable", i.e. the working copy
 corruption will not get propogated into the repository. As follows

 check-out: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuff

 Note the working-copy is corrupt

 edit: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuffadded

 diff: CRLF working-copy
                           -stuff
                           +stuffadded

 commit: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuffadded stuffadded

 Of course the resulting binary may be useless, but any scheme that
 does automatic line-end conversion can produce temporary corruption,
 and if this is not noticed problems will inevitably occur.

Hmm, 3:45am, time for bed said Zebedee

-- 
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:53 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.