[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Sat, 14 Nov 2009 01:10:35 +0000

Mike Samuel wrote:
> 2009/11/13 Stefan Sperling <stsp_at_elego.de>:
> > On Fri, Nov 13, 2009 at 08:30:53AM +0100, B. Smith-Mannschott wrote:
> >> We've got lots of XML files in our repository. Some of them make sense
> >> to merge (Maven's pom.xml) and some do not (UML models stored in XMI
> >> format).

Should those two types of files have the same MIME type? If so,
certainly you have that problem, but I suspect they should not. If you
choose to label them both as plain XML (text/xml? can't remember), then
I think it is fair that tools should handle them in the same way.

> > I think this is a very good point.
> > So "textiness" does not imply "mergeable". A file might contain text
> > in some character encoding, but a merge tool may not be able to handle
> > it properly because of the application-level semantics the text represents.
> > It would just garble the file.
> >
> > Mike, I think this is quite important. It means that even just looking
> > for "text/*" is totally wrong. Extending this flawed mechanism by making
> > Subversion look for "; charset=" is not gonna help.

Well, it would help a little bit... just not very much.

> The subject of this
> > thread should not be "Classifying files as binary or text" but "How to
> > determine whether a file is suitable for being merged?".


> Fair enough. And Branko's suggestion to start small with a property
> that can be extended from "should merge" to "how to merge" seems wiser
> given that the merge concern is the only real concern.

Think how we would use such a feature in our own source tree. Oh boy, we
would say, so now we have to add a property on every file? Am I going to
examine every file's contents by hand? No, I'll use filename matching to
get the vast majority: find . -name "*.{c,h,py,txt,...}", probably. And
maybe for some files I would want to search by the MIME type.

And then I would need to make sure all newly added files will get the
appropriate value too. I suppose I would try to hack it in with

If my trawl through the repository was complete and my autoprops are set
up just right, I then have the property set the way I want it on all the

But that is the long-winded way of doing it.

All the time, all I wanted was to base my choice on simple
machine-readable things: the file name, the MIME type (where known), and
a heuristic scan of the first 8000 bytes of the file as a fall-back

I want to tell Subversion to base its choice of merge (currently just
text-line-based or non-mergeable) on the "kind of file" indicated by
those three already-existing parameters. I don't want to be able to mark
one C source file as mergeable and another C source file as not. That
level of configurability would be way over the top, and would only make
the tool harder to use because of the difficulty of keeping all the
settings in sync (all C files mergeable, for example).

(My point is not invalidated by saying that we would either fall back to
the current method, or would assume text, if the new prop is not there.)

My recommendation:

So I believe we need a way to configure what combinations of (filename,
MIME type, heuristic) are considered diffable/mergeable, and allow this
configuration mechanism to be extensible to also say what combinations
of (filename, MIME type, heuristic) are to be diffed/merged by each of
one or more specified external diff/merge programs.

This proposal would work better with server-side configuration, of
course, but so would the auto-props that are necessary for the
dedicated-property proposal. That is a separate issue, and client-side
config will work well enough for the time being.

- Julian

> Please consider my proposal withdrawn.
> > So I'd favour Branko's proposal, a property which communicates to Subversion
> > whether file content is suitable to be passed to a diff or merge tool,
> > regardless of the mime-type. [...]
> >
> > The svn:mime-type property still has its place, e.g. it is useful
> > whenever a specific mime-type should be sent to a web browser browsing
> > a Subversion repository. But I agree with deprecating its use to detect
> > "mergeability".

Received on 2009-11-14 02:10:55 CET

This is an archived mail posted to the Subversion Dev mailing list.