[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ascii/binary detection.

From: Branko Čibej <brane_at_xbc.nu>
Date: 2001-08-01 01:33:21 CEST

cmpilato@collab.net wrote:

>According to the alpha-checklist.txt file, the problems--no, the
>"challenges!"--of text/binary detection, keyword substitution, and
>newline translation "will be solved by three separate properties":
>mime-type, keyword substitution, and newline translation. Branko,
>this is basically your brainchild, so I hope you're paying attention
>to what follows (with your editor's red pen in hand!)
Got a big, fat, orange fluorescent marker here, hur hur hur!

>Confused as to why the mime-type was needed, I asked Karl the
>reasoning behind this property. Karl explained that we want to allow
>files to be marked as wanting newline conversion and/or keyword
>substitution, but allow those attributes to be ignored for binary
>files. That way users can enable those features for, say, '*' in
>their working copy directory, and not have to worry about whether or
>not each target in that directory is an ascii file. This is important
>because a file's binariness can switch back and forth over the file's
Right. Note that when we were discussing this (damn, don't have that
archive any more ...), someone pointed out that the "text/*" mime types
actually imply CRLF line endings. But I think we can safely ignore that;
Subversion is not an MUA.

>I'm proposing the following:
>1. Develop a heuristic for determining the binariness of a file, say
> svn_io_is_binary_file ()
(Two suggestions: a) don't mark the file as binary just because there's
a byte with value >= 128 in it; b) if other tests aren't conclusive,
check for extremely long lines in the file?)

>2. During `svn add', svn_io_is_binary_file () is called (only on
> files, of course). If it returns TRUE, the property
> `svn:mime-type' is set on the file with a value of
> `application/octet-stream'.
What do you think about following the HTTP convention here? Call the
property svn:content-type, and encode the character set, too? Not that
we'll do anything with that info in 1.0.

> [NOTE: It'd be really nice at this
> point for the UI to say either "Added binary file foo" or "Added
> text file foo", but then again, it'd be nice if the UI said
> *anything* during an add].
I agree. Since the heuristic can't be 100% accurate, we definitely have
to say what we guess about the file.

>3. At this point, the user can use `svn propset' (or `svn propdel')
> to change the values of svn:mime-type, svn:line-ending, and
> svn:keywords. We can also provide convenience subcommands for
> making these special property modifications, too (but don't make
> me pull out any -kkv's or anything!)
Just have 'svn add' accept --text and --binary, and possibly
--content-type=..., --end-of-line=...

>Now, a word about the values of these three properties.
> If this property is present on a given file, its value is used to
> determine the binary-ness of the contents of that file. Values
> for this can really be just about anything, but some notable ones
> are:
> 'application/octet-stream' - Generic binary file. No keyword
> substitution or newline
> translation will occur on this
> file. Also, `svn diff' will not
> try to display a diff for this file.
Why not keyword substitution? Just make it off by default. If the user
wants keyword substitution in binary files, we cna always let him shoot
himself in the foot. Besides, it can actually make sense in some kinds
of binary formats.

> 'text/foo' - Text file (where 'foo' is some
> mime subtype like 'plain' or
> 'html'). Keyword substitution
> and newline translation are
> available for this file, and `svn
> diff' will actually display diffs
> for it.
          'anything/else' - Treat as generic binary, for now. Later
                                       on we can hang constom-diff hooks and
                                       other nice stuff on that.

> If this property is present on a given non-binary file, its value
> is used to determine how line-endings should be translated.
> Values for this can be:
> 'native' - Use the line ending mechanism native
> to the user's operating system.
> 'dos', 'unix', or 'mac' - Use CRLF, LF, or LFCR, respectively.
I'm not sure what the correct 'mac' line ending is. Have to check that.

There are (used to be?) systems where lines are delimited from both
ends. On VMS, a line started with a LF and ended with a CR, IIRC. How
about a more generic approach: the value of this property is a pair of
strings, one for the BOL and one for the EOL marker. 'native' would
still have the same meaning, while 'dos', 'unix' and 'mac' would be
aliases for ':\r\n', ':\n' and ':\n\r' (or whatever), respectively. A
VMS guy would make 'native' an alias for '\n:\r'.

(And someone porting SVN to the ZX Spectrum will define 'native' as
':\r' -- then run out of memory when compiling neon :-)

> Absence of this property means that no line-ending substitution
> should occur at all.
Um. I'd rather use 'none' (':', if you accept the idea outlined above),
and make 'native' the default for text files. Oh, and we have to
prescribe the repository's native format, so that we can send deltas
back and forth.

> If this property is present on a given non-binary file, its value
> is used to determine which keywords will be substituted in that
> file. The value is expected to be a comma-delimited list of
> keywords from the following set:
> 'author' - replaces the keyword placeholder $Author$
> 'date' - replaces the keyword placeholder $Date$
> 'header' - replaces the keyword placeholder $Header$
> 'revision' - replaces the keyword placeholder $Revision$
> ...and maybe some others, depending on whatch'all want.
> Absence of this property means that no keyword substitution should
> occur at all.
Hum. Can't find a nit here. How sad. :-)

This looks good, even if you ignore all my comments.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:34 2006

This is an archived mail posted to the Subversion Dev mailing list.