[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Towards standardising mime-type support

From: Trent Apted <tapted_at_it.usyd.edu.au>
Date: 2005-04-01 09:38:57 CEST

Thanks for your reply.

Branko Čibej wrote:

> Trent Apted wrote:
>
>> When I run
>>
>> $ file -bi something.c
>>
>> (or .cpp, .h, .cc, etc.)
>>
>> /usr/bin/file reports that the mime-type for C/C++ files is
>>
>> text/x-c; charset=us-ascii
>
>
> Well, this is clearly wrong, there's no such thing as a "text/x-c"
> mime type.

Perhaps true. RFC2046 *only* defines the 'plain' subtype of the text
mime type, but we all use text/html. It also says any unrecognised mime
types should just be treated as text/plain, so perhaps whether or not it
is valid is moot.

>> However, if I feed this to Subversion, it treats the file as binary.
>> So, that's fine, I'll drop the charset stuff,
>
>
> Ah, yes, We should know about charset attributes.
>
>> and things are mostly back to normal, but the added information still
>> appears to be meaningless to Subversion.
>
>
> Define "meaningless". You've told SVN that this is a text file, and
> that's it. SVN doesn't interpret the mime type any further than that
> (yet).

I guess I'm saying that text/x-c and text/x-cc would imply that a file
is source code, and hence platform independent, thus should always use
the 'native' eol-style and should never be executable. While you should
still specify the style and executability for something that is
text/plain. However, this might not suit everyone...

>> Further, perhaps there should be a feature with support for
>> `/usr/bin/file -bi` --- "auto-auto-props" might be nice.
>
>
> We've talked about using platform-specific mechanisms to guess the
> mime type. What's missing is somebody with enough time on their hands
> to actually do this.

/usr/bin/file uses a tab-separated file of the form:

$ cat /usr/share/misc/file/magic.mime
# Magic data for KMimeMagic (originally for file(1) command)
#
# The format is 4-5 columns:
# Column #1: byte number to begin checking from, ">" indicates
continuation
# Column #2: type of data to match
# Column #3: contents of data to match
# Column #4: MIME type of result
# Column #5: MIME encoding of result (optional)

#------------------------------------------------------------------------------

This would not be platform-specific. Actually, looking further it
appears that the heuristic for c/c++ files is that if a file starts with
"/*" it is C, and if with "//" it is C++. `0xbabe` is a Java class file,
etc.

I can write a patch, if you like.. PhD students tend to find time for
all kinds of non-thesis pursuits ;-)

- Trent.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Received on Fri Apr 1 09:42:04 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.