Greg Hudson <ghudson@mit.edu> writes:
> 1. We can avoid irrevocably destroying data if we make sure all
> newline translations we do are reversible. A newline translation
> is reversible if there are no CRs or LFs in the file which aren't
> source-format newlines.
This is the property I have been trying to use.
Background (feel free to skip this para it explains why I wan't this,
not how it works) I worked with ClearCase on a C++ project in a mixed
Unix/NT environment earlier this year. ClearCase does line-ending
conversion on a per-view basis for text files (a view is a working-
copy in subversion terms). We used views that were mounted on both
Unix and NT boxes, i.e. one view was simultaneously mounted on both
machines. This made cross platform development much easier as changes
could be built on both platforms without requiring them to be either
checked-in, or manually copied between views. However it caused real
problems if line-ending conversion was enabled. A view set up to use
NT line-endings say was hard to use from a Unix box, since every line
has already changed. Merging, which is normally a ClearCase strong
point, was disrupted if a set of line-end changes got eroneously
committed. We ended up abandoning the line-end conversion, and using
pre-commit triggers to produce the line-endings we wanted (.ds[pw]'s
had CRLF, source code plain LF). This initially disposes me to
support no line-end conversion, and for the default to be off if it is
present. However given that lots of people want it, take a deep breath
and here goes.
Proposal
========
Rules:
- The text-base always duplicates the repository.
- Any sort of line-ending can appear in any file in the repository.
- There is a native-line-end property that can be set on a file. I am
not sure if this is a separate property from the text/binary thing
as I am not sure what the text/binary thing does at present!
Rules when native-line-end is not set:
- If the property is not set no line-end conversion occurs. The
working-copy duplicates the repository. File get commited exactly
as they appear in the working-copy, just your straight binary file.
Rules when native-line-end is set:
- At check-out/update/revert convert all line-endings in the working
copy to whatever the platform requires. Store the platform line-end
property in the .svn/entries file (or wherever) to allow checkout
with Unix client and check-in with non-Unix client or vice
versa. The .svn/entries property is "none" or "LF" or "CRLF" etc.,
i.e. an explict line-ending and not just "native".
- At check-out/update/revert there is a -no-convert option to disable
line-end conversion, overriding the native-line-end property. This
also changes the line-end property in the .svn/entries file.
- At commit check the .svn/entries file to determine the
line-end property. When generating the diff between the
working-copy and the text-base if a line-end difference is
explained by the line-ending conversion ignore it. If the
introduced line-endings are incompatible with the .svn/entries
line-end property display an error.
Diff Algorithm:
The diff algorithm is basically as follows: do the line-ending
conversion specified in the .svn/entries file on the text-base to
generate the pristine working-copy. Diff the pristine working-copy
and the actual working copy. Within the diff, undo the line-ending
conversion on the diff for those parts that represent the
text-base. Within the diff, verify that all line-endings on for those
parts that represent the working-copy are consistent with the
.svn/entries property. This diff is now suitable to send to the
repository.
Advantages
==========
- On the wire and repository diff's are small.
- The working copy file gets commited exactly and does not change.[1]
- Any working-copy file that gets comitted can always be retrieved
exactly.
- If an erroneously converted working-copy gets commited the
corruption does not in general get back into the repository.
[1] Any automatic conversion system has to allow the conversion
enabling property to be unset. When this property change is commited
the working copy needs to be changed to match the repository. This
applies whatever scheme we use. Perhaps it should occur when the user
does the propset rather than waiting until the commit?
Disadvantages
=============
- More complicated diff algorithm, I'm not even sure the vdelta
algorithm can be made to operate this way.
- Something I haven't thought of...
Examples
========
Scenario 1: text file with native-line-end property
----------
check-out: text-base CRLF working-copy
abc\n abc\r\n
def\n def\r\n
ghi\n ghi\r\n
edit: text-base CRLF working-copy
abc\n abc\r\n
def\n XXX\r\n
ghi\n ghi\r\n
diff: CRLF working-copy
-def\n
+XXX\r\n
commit: text-base CRLF working-copy
abc\n abc\r\n
XXX\r\n XXX\r\n
ghi\n ghi\r\n
Note that the working-copy does not need to change at commit, and
remains what would appear if the user checked-out on this platform.
check-out: text-base LF working-copy
abc\n abc\n
XXX\r\n XXX\n
ghi\n ghi\n
edit: text-base LF working-copy
abc\n abc\n
XXX\r\n YYY\n
ghi\n ghi\n
diff: LF working-copy
-XXX\r\n
+YYY\n
commit: text-base LF working-copy
abc\n abc\n
YYY\n YYY\n
ghi\n ghi\n
Note that once again the working-copy does not need to change at
commit.
Scenario 2: binary file with erroneous native-line-end property
----------
add: text-base LF working-copy
The .svn/entries line-end indicates LF the platform native.
edit: text-base LF working-copy
some\n
binary\r\n
data
diff: LF working-copy
+some\n
+binary\r\n
+data
Note that the diff contains line-end changes that are incompatible
with the native-line-end property. This might trigger the error, or
it may be delayed until the commit. The commit fails unless the user
removes the native-line-end property
commit: text-base LF working-copy
some\n some\n
binary\r\n binary\r\n
data data
Note that this can only be commited without line-end conversion.
Scenario 3: binary file with erroneous native-line-end property
----------
add: text-base LF working-copy
edit: text-base LF working-copy
more\n
binary\n
stuff
commit: text-base LF working-copy
more\n more\n
binary\n binary\n
stuff stuff
Here the binary does not have a conflicting line-ending, so the
commit succeeds.
check-out: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuff
Here the working-copy is corrupt. If the user recognises this the
native-line-end property can be changed and commited. This, as in any
other scheme, has to update the working-copy. Then the user has the
correct binary file. If the user does not have commit access, they
can use the -no-convert option to get a valid working-copy.
check-out: text-base CRLF working-copy
-no-convert more\n more\n
binary\n binary\n
stuff stuff
If the corruption is unnoticed, and the user continues, the amount of
corruption in the repository is "stable", i.e. the working copy
corruption will not get propogated into the repository. As follows
check-out: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuff
Note the working-copy is corrupt
edit: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuff stuffadded
diff: CRLF working-copy
-stuff
+stuffadded
commit: text-base CRLF working-copy
more\n more\r\n
binary\n binary\r\n
stuffadded stuffadded
Of course the resulting binary may be useless, but any scheme that
does automatic line-end conversion can produce temporary corruption,
and if this is not noticed problems will inevitably occur.
Hmm, 3:45am, time for bed said Zebedee
--
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:53 2006