[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Towards standardising mime-type support

From: Trent Apted <tapted_at_it.usyd.edu.au>
Date: 2005-04-01 14:43:15 CEST

Golly, no need to be rude.

Branko Čibej wrote:

> Trent Apted wrote:
>
>> Thanks for your reply.
>>
>> Branko Čibej wrote:
>>
>>> Trent Apted wrote:
>>>
>>>> When I run
>>>>
>>>> $ file -bi something.c
>>>>
>>>> (or .cpp, .h, .cc, etc.)
>>>>
>>>> /usr/bin/file reports that the mime-type for C/C++ files is
>>>>
>>>> text/x-c; charset=us-ascii
>>>
>>>
>>>
>>>
>>> Well, this is clearly wrong, there's no such thing as a "text/x-c"
>>> mime type.
>>
>>
>>
>> Perhaps true. RFC2046 *only* defines the 'plain' subtype of the text
>> mime type, but we all use text/html. It also says any unrecognised
>> mime types should just be treated as text/plain, so perhaps whether
>> or not it is valid is moot.
>
>
> RFC2046 isn't the canonical reference. This
> (http://www.iana.org/assignments/media-types/) is the canonical
> reference.

Just because something hasn't been assigned doesn't mean it's invalid.
This is why there are types and subtypes.

>>>> However, if I feed this to Subversion, it treats the file as
>>>> binary. So, that's fine, I'll drop the charset stuff,
>>>
>>>
>>>
>>>
>>> Ah, yes, We should know about charset attributes.
>>>
>>>> and things are mostly back to normal, but the added information
>>>> still appears to be meaningless to Subversion.
>>>
>>>
>>>
>>>
>>> Define "meaningless". You've told SVN that this is a text file, and
>>> that's it. SVN doesn't interpret the mime type any further than that
>>> (yet).
>>
>>
>>
>> I guess I'm saying that text/x-c and text/x-cc would imply that a
>> file is source code, and hence platform independent, thus should
>> always use the 'native' eol-style and should never be executable.
>> While you should still specify the style and executability for
>> something that is text/plain. However, this might not suit everyone...
>
>
> Media types do not define the encoding, only the type of the contents.
> Therefore we a) can't extrapolate eol-style from the mime type, and b)
> would be totally wrong to do so because there are valid reasons _not_
> to use native eol-style even in mixed-platform environments.

a) I'm not convinced that we can't determine an eol-style because we
only know the type of contents, and b) I can't think of any reason why I
would want my source code in something other than the native format for
whatever platform I'm editing it on.

Perhaps I'm merely used to CVS taking care of all this for me.

>>>> Further, perhaps there should be a feature with support for
>>>> `/usr/bin/file -bi` --- "auto-auto-props" might be nice.
>>>
>>>
>>>
>>>
>>> We've talked about using platform-specific mechanisms to guess the
>>> mime type. What's missing is somebody with enough time on their
>>> hands to actually do this.
>>
>>
>>
>> /usr/bin/file uses a tab-separated file of the form:
>>
>> $ cat /usr/share/misc/file/magic.mime
>> # Magic data for KMimeMagic (originally for file(1) command)
>> #
>> # The format is 4-5 columns:
>> # Column #1: byte number to begin checking from, ">" indicates
>> continuation
>> # Column #2: type of data to match
>> # Column #3: contents of data to match
>> # Column #4: MIME type of result
>> # Column #5: MIME encoding of result (optional)
>>
>> #------------------------------------------------------------------------------
>>
>>
>> This would not be platform-specific.
>
>
> *ROTFL*
>
> Oh and surely it would not be platform specific at all, aye. It would
> work on *every* Linux box in the world (well, except some older ones).

Don't be silly. This is a plaintext file. Last I checked text files were
readable on my non-Linux computers -- Windows isn't that bad.

Sarcasm aside, the point is the way in which the mime type is
determined, which I had hoped would be evident in my choice of
quotation. Reading the first N bytes of a file and matching them to a
known sequence (which could be in a file distributed with Subversion or
statically linked via source automatically generated from the above
file) is in no way platform dependent. Obviously this is not a complete
solution, but it gets most of the way there.

Perhaps my annoyance with having to specify a native eol-style each time
I add a new source file is testament to the fact that I am knowledgeable
about what it means for something to be platform dependent, having been
a developer of cross-platform software projects for some time now.

>> Actually, looking further it appears that the heuristic for c/c++
>> files is that if a file starts with "/*" it is C, and if with "//" it
>> is C++. `0xbabe` is a Java class file, etc.
>>
>> I can write a patch, if you like.. PhD students tend to find time for
>> all kinds of non-thesis pursuits ;-)
>
>
> Any change in this direction must be generic in the sense that it
> allows platform-specfic implementations, and it must produce portable
> results. I will veto any patch that produces MIME types that are not
> in the IANA registry.
>
Maybe I'll just work on that patch, despite your discouragement, and see
if I can make you happy.

Thanks for your feedback,
    Trent.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Fri Apr 1 14:56:34 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.