[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn_mime_type_is_binary, libmagic and application/xml

From: Ben Reser <ben_at_reser.org>
Date: Wed, 20 Nov 2013 00:55:51 -0800

On 11/19/13 11:39 PM, Stefan Sperling wrote:
> Hmm... I don't see a need to change anything.
> libmagic is used as a last resort. Subversion already offers several
> ways of configuring the svn:mime-type property, all of which supersede
> libmagic (svn propset/propedit, auto-props, svn:auto-props).
> So unless there is a really good reason why any of these cannot be used,
> I think we should tell our users to configure their clients appropriately.

There is a huge difference between the stance of telling users to not set
svn:mime-type/auto-props to application/xml for XML files they want treated as
text and telling them to remove the automatically added svn:mime-type that
libmagic is setting.

libmagic is causing us to set application/xml on any XML file regardless of
file extension. In order for a user to configure around this by setting
auto-props they'll have to anticipate every file extension that happens to be
detected as application/xml. That's rather burdensome.

The alternative is they build without libmagic. Which is burdensome for users
that are using binaries. The alternative is to suggest that we allow you to
disable libmagic even if it was built with it (which might not be a bad idea
even with the changes I propose). But I think that's really just hiding the

> Additionally, users can use their own MAGIC files to control what
> mime-types libmagic gives to Subversion. We don't need to replicate
> this functionality in Subversion itself.

We're not duplicating functionality. libmagic doesn't tell you if a file is
text it tells you the format. In the case of XML if the file is text or binary
is ambiguous. However, I suspect most XML files that people are putting into
SVN are text (though I'm sure people will find all manner of exceptions).

Really though the terms text and binary here are misleading. The issue isn't
so much if the content is text or binary but if the content is
diffable/mergeable which is what we use the output of svn_mime_type_is_binary()
to decide.

Even if libmagic scanned the entire file (which it doesn't) and looked at the
character set it couldn't tell you if a file is diff/mergeable. The file could
be base64 encoded data, which will be no more mergeable than unencoded data.

> libmagic uses a heuristic, so it can get things wrong. But if we start
> filtering libmagic output for the purpose of treating XML documents
> as text, we could eventually end up with a huge list of hard-coded
> exceptions for all sorts of things. Where do we stop? How can we make
> sure that everyone will be happy with a list we make up?
> The whole point of relying on libmagic is to avoid such a list.

We don't need to make everyone happy with the list we're providing. If we make
it configurable and we make it available as a server directed configuration
then everyone can make choices that are best for their project.

That said the bug that Johan linked to (which I'd long since forgotten about)
has a link to an email that I wrote in 2004 (that I also forgot about) that
makes a compelling argument as to why application/xml shouldn't default to
being treated as text. So I retract that portion of my proposal for now (if
and when we grow checkpoints and can undo updates/merges that lose local
changes then we can adjust the default).

For reference that email is here:
Received on 2013-11-20 09:56:34 CET

This is an archived mail posted to the Subversion Dev mailing list.