[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Text mime types

From: Nicolas Goutte <nicolasg_at_snafu.de>
Date: 2005-06-08 16:27:23 CEST

On Wednesday 08 June 2005 13:15, Julian Foad wrote:
> [Replying only to the "dev" list, as we're now discussing the design.]
> Greg Thomas wrote:
> > Nicolas Goutte <nicolasg@snafu.de> wrote:
> >>I would be nice if Subversion would have a svn:text property to tweak it
> >>independently (even if perhaps its default would be "look at the mime
> >> type").
> We have to recognise that there is NOT a hard distinction between "text"
> and "binary". There are different forms and degrees of "textiness".
> Examples in approximate order of decreasing "textiness": ASCII,
> iso-latin-1, UTF-8, UTF-16; text files with a bit of binary data at the
> beginning, middle or end; binary files with some text embedded.

Just be careful that the "binary data" could be shifts in the specific
encoding (for example some East-Asian encodings) as far as I have understood
encodings. In such a case, you cannot simply add something like conflict
marks despite the file looking like ASCII (or at least like a
ISO-8859-something file) without that binary data.

The best way would be to really check the encoding but as far as I have
understood this is far from being obvious and there are pitfalls (for example
that UTF-8 must be tested for any ISO-8859).

> Therefore it is wrong to have a flag that says just "This is text". We
> need to say "This is parseable by Subversion's built-in diff" or "This is
> displayable on the console" or other such precise statements.

That is why personally I would prefer (additionally) a keyword handling the
encoding. So if a svn tool cannot handle that encoding it treats it as
binary. That is also safe for (future) mixed Subversion environments where
parts of Subversion could perhaps process an encoding and the other part
could not, depending on the client's version.

The only problem with an encoding is that a real binary file, for example an
executable, as no encoding at all. So such a file must be recognised or
forced to be treated as a binary. (The svn:executable will not help here, as
for example an object file is not an executable but it is a binary.)

> I don't think adding such flags to a file's properties is the way to go in
> general, because metadata should describe the file's inherent properties,
> not the manner in which it should be treated by certain specific tools. I

Again on advantage of telling the encoding. The encoding is a file property.

> think we should implement those decisions as a configurable function of
> MIME type. It might possibly be useful to have such properties to override
> the general configuration in special cases.

Personally I do not mind if such functions is only for overriding a default

And if you want to add with the MIME type in svn:mime-type, at least many mime
types have a charset extension. But I am not sure if it is the best place to
put it in.

> > Currently, the determination of whether or not files are binary is a
> > bit arbitrary - a file is considered binary if it has a svn:mime-type
> > other than text/*, image/x-xbitmap or image/x-xpixmap.
> That's one part of it. Another part is looking at some of the bytes to see
> how close they are to ASCII. Subversion's determination and handling of
> textiness needs a fair bit of enhancement.

As I have written above, it is not linked to nearly ASCII or not (even if this
could be the current question for the current Subversion). It is more: does
the svn tool in question support the encoding. If does not at all, it should
handle the file as binary.

(Perhaps said otherwise, if you choose a strategy for ASCII-likeness today, it
might be more difficult the day where svn will (need to) handle
non-ASCII-like encodings.)

> > A simple svn:binary flag set if needed automatically when a file is
> > added (cf application/octet-stream) should make the whole thing a lot
> > simpler
> Make what simpler?
> > - it will also solve the problem of more exceptions being
> > added to the current list.
> No, it _moves_ the problem to "svn add" and "svn import". For users
> affected by the current inextensible determination of textiness, it would
> make life easier by requiring only a one-off tweak rather than a
> work-around each time the file is diffed etc., but it's not really a proper
> solution to the problem.

Yes, of course, if the detection is done automatically when adding or
importing, that would be great.

> - Julian

Have a nice day!

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 8 16:34:51 2005

This is an archived mail posted to the Subversion Dev mailing list.