Yet another line-end proposal (YALEP?)

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2001-12-14 04:52:59 CET

Greg Hudson <ghudson@mit.edu> writes:

> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.

This is the property I have been trying to use.

Background (feel free to skip this para it explains why I wan't this,
not how it works) I worked with ClearCase on a C++ project in a mixed
Unix/NT environment earlier this year. ClearCase does line-ending
conversion on a per-view basis for text files (a view is a working-
copy in subversion terms). We used views that were mounted on both
Unix and NT boxes, i.e. one view was simultaneously mounted on both
machines. This made cross platform development much easier as changes
could be built on both platforms without requiring them to be either
checked-in, or manually copied between views. However it caused real
problems if line-ending conversion was enabled. A view set up to use
NT line-endings say was hard to use from a Unix box, since every line
has already changed. Merging, which is normally a ClearCase strong
point, was disrupted if a set of line-end changes got eroneously
committed. We ended up abandoning the line-end conversion, and using
pre-commit triggers to produce the line-endings we wanted (.ds[pw]'s
had CRLF, source code plain LF). This initially disposes me to
support no line-end conversion, and for the default to be off if it is
present. However given that lots of people want it, take a deep breath
and here goes.

Proposal
========

Rules:

- The text-base always duplicates the repository.

- Any sort of line-ending can appear in any file in the repository.

- There is a native-line-end property that can be set on a file. I am
not sure if this is a separate property from the text/binary thing
as I am not sure what the text/binary thing does at present!

Rules when native-line-end is not set:

- If the property is not set no line-end conversion occurs. The
working-copy duplicates the repository. File get commited exactly
as they appear in the working-copy, just your straight binary file.

Rules when native-line-end is set:

- At check-out/update/revert convert all line-endings in the working
   copy to whatever the platform requires. Store the platform line-end
   property in the .svn/entries file (or wherever) to allow checkout
   with Unix client and check-in with non-Unix client or vice
   versa. The .svn/entries property is "none" or "LF" or "CRLF" etc.,
   i.e. an explict line-ending and not just "native".

- At check-out/update/revert there is a -no-convert option to disable
line-end conversion, overriding the native-line-end property. This
also changes the line-end property in the .svn/entries file.

- At commit check the .svn/entries file to determine the
   line-end property. When generating the diff between the
   working-copy and the text-base if a line-end difference is
   explained by the line-ending conversion ignore it. If the
   introduced line-endings are incompatible with the .svn/entries
   line-end property display an error.

Diff Algorithm:

The diff algorithm is basically as follows: do the line-ending
conversion specified in the .svn/entries file on the text-base to
generate the pristine working-copy. Diff the pristine working-copy
and the actual working copy. Within the diff, undo the line-ending
conversion on the diff for those parts that represent the
text-base. Within the diff, verify that all line-endings on for those
parts that represent the working-copy are consistent with the
.svn/entries property. This diff is now suitable to send to the
repository.

Advantages
==========

- On the wire and repository diff's are small.
- The working copy file gets commited exactly and does not change.[1]
- Any working-copy file that gets comitted can always be retrieved
exactly.
- If an erroneously converted working-copy gets commited the
corruption does not in general get back into the repository.

[1] Any automatic conversion system has to allow the conversion
enabling property to be unset. When this property change is commited
the working copy needs to be changed to match the repository. This
applies whatever scheme we use. Perhaps it should occur when the user
does the propset rather than waiting until the commit?

Disadvantages
=============

- More complicated diff algorithm, I'm not even sure the vdelta
algorithm can be made to operate this way.
- Something I haven't thought of...

Examples
========

Scenario 1: text file with native-line-end property
----------

check-out: text-base CRLF working-copy
               abc\n abc\r\n
               def\n def\r\n
               ghi\n ghi\r\n

edit: text-base CRLF working-copy
               abc\n abc\r\n
               def\n XXX\r\n
               ghi\n ghi\r\n

diff: CRLF working-copy
-def\n
+XXX\r\n

commit: text-base CRLF working-copy
               abc\n abc\r\n
               XXX\r\n XXX\r\n
               ghi\n ghi\r\n

Note that the working-copy does not need to change at commit, and
remains what would appear if the user checked-out on this platform.

check-out: text-base LF working-copy
               abc\n abc\n
               XXX\r\n XXX\n
               ghi\n ghi\n

edit: text-base LF working-copy
               abc\n abc\n
               XXX\r\n YYY\n
               ghi\n ghi\n

diff: LF working-copy
-XXX\r\n
+YYY\n

commit: text-base LF working-copy
               abc\n abc\n
               YYY\n YYY\n
               ghi\n ghi\n

Note that once again the working-copy does not need to change at
commit.

Scenario 2: binary file with erroneous native-line-end property
----------

add: text-base LF working-copy

The .svn/entries line-end indicates LF the platform native.

edit: text-base LF working-copy
                            some\n
                            binary\r\n
                            data

diff: LF working-copy
                           +some\n
                           +binary\r\n
                           +data

Note that the diff contains line-end changes that are incompatible
with the native-line-end property. This might trigger the error, or
it may be delayed until the commit. The commit fails unless the user
removes the native-line-end property

commit: text-base LF working-copy
               some\n some\n
               binary\r\n binary\r\n
               data data

Note that this can only be commited without line-end conversion.

Scenario 3: binary file with erroneous native-line-end property
----------

add: text-base LF working-copy

edit: text-base LF working-copy
                            more\n
                            binary\n
                            stuff

commit: text-base LF working-copy
               more\n more\n
               binary\n binary\n
               stuff stuff

Here the binary does not have a conflicting line-ending, so the
commit succeeds.

check-out: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuff

Here the working-copy is corrupt. If the user recognises this the
native-line-end property can be changed and commited. This, as in any
other scheme, has to update the working-copy. Then the user has the
correct binary file. If the user does not have commit access, they
can use the -no-convert option to get a valid working-copy.

check-out: text-base CRLF working-copy
-no-convert more\n more\n
binary\n binary\n
stuff stuff

If the corruption is unnoticed, and the user continues, the amount of
corruption in the repository is "stable", i.e. the working copy
corruption will not get propogated into the repository. As follows

check-out: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuff

Note the working-copy is corrupt

edit: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuff stuffadded

diff: CRLF working-copy
-stuff
+stuffadded

commit: text-base CRLF working-copy
               more\n more\r\n
               binary\n binary\r\n
               stuffadded stuffadded

Of course the resulting binary may be useless, but any scheme that
does automatic line-end conversion can produce temporary corruption,
and if this is not noticed problems will inevitably occur.

Hmm, 3:45am, time for bed said Zebedee

-- 
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sat Oct 21 14:36:53 2006

This message: [ Message body ]
Next message: Guillaume Boissiere: "Overview diagrams for commit and checkout"
Previous message: Garrett Rooney: "Re: Bitesized 576"
In reply to: Greg Hudson: "Newlines, preserving data, and multiple access paths"
Next in thread: Karl Fogel: "Re: Yet another line-end proposal (YALEP?)"
Reply: Karl Fogel: "Re: Yet another line-end proposal (YALEP?)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]