[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Branko Cibej <brane_at_xbc.nu>
Date: Fri, 13 Nov 2009 03:25:45 +0100

Mike Samuel wrote:
> 2009/11/12 Branko Cibej <brane_at_xbc.nu>:
>> Mike Samuel wrote:
>>> 2009/11/12 Mark Phippard <markphip_at_gmail.com>:
>>>> On Thu, Nov 12, 2009 at 6:20 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>>>>> Conclusions from the svn:charset thread that Mark pointed out:
>>>>> (1) This proposal should not gate on svn:charset since it isn't yet
>>>>> recognized as official
>>>>> (2) We should avoid the term encoding in documentation of this feature.
>>>>> (3) There may be some bad interactions between ";charset=" in
>>>>> svn:mime-type and auto-props, but this proposal does not raise new
>>>>> issues, and those issues are a result of an error (possibly since
>>>>> fixed?) in auto-props.
>>>>> From the svn:charset thread:
>>>>> Much of the early debate deals with svn:charset being non-standard and
>>>>> non-approved. I tend to agree with Stefan, that this proposal
>>>>> shouldn't gate on svn:charset being approved so I suggest tabling
>>>>> variant 1.
>>>> Correct me if I am wrong, but the only real goal we have right now is
>>>> to improve SVN's ability to tell itself "this is text" and I can do
>>>> textual merging?
>>> That is correct.
>>>> So why not just add an svn:text property that has a
>>>> value of '*'. The presence of the property means "treat this as
>>>> text".
>>> To make sure I understand your counter-proposal, would a file be
>>> treated as text if at least one of (svn:mime-type starts with "text/"
>>> or matches the existing whitelist) OR (svn:text exists and is "*")?
>>> Or are you advocating dropping the first clause which is there for
>>> backwards-compatibility?
>> I think we all agree that using the MIME-type to decide whether we can
>> use contextual text-base merge for a file has turned out to be trickier
>> than we originally expected. It makes sense to find a better solution to
>> the problem.
> I'm afraid that I'm unfamiliar with these discussions since I just
> joined this list and have never submitted a patch to SVN before.
> Can you explain why my argument that mime-types do specify "textiness"
> as described in RFC 2046 is flawed or point me at threads that discuss
> why the trickiness?

Media types in themselves are not enough to specify textiness, as I
believe you pointed out yourself; svn:mime-type is properly the file's
media type, since encoding and other attributes can affect the actual

The trickiness stems from the fact that it is not "easy" to answer the
question whether some media type describes a text-like file. Initially,
years ago, we somewhat naïvely assumed that if it matches the pattern
"text/*", then it's text, otherwise it's not. It turns out that a whole
group of media types in the application/* family are in fact text, and a
couple in image/* and so on; there's no actual restriction that I'm
aware of in the MIME rules that would forbid you to describe text in any
of the major media-type families.

What that amounts to is that, for completeness and correctness, you'd
have to carefully catalogue all the media types and applicable
attributes, and make complex decisions based on that data; *and* keep it
up to date, which apparently is not as easy as one would like (e.g., see
http://roy.gbiv.com/untangled/2009/wrangling-mimetypes). I don't think
the goal would justify the ongoing effort involved.

>> I would suggest just slightly future-proofing Mark's proposal, and
>> certainly staying backward-compatible. The rules should go like this:
>> * If a file has no content-related property (i.e, no svn:mime-type),
>> treat it as text.
>> * If there's only an svn:mime-type property, keep its current semantics.
>> * Introduce a new property that overrides svn:mime-type, but don't
>> call it svn:text (which implies it's a boolean), but, e.g., just
>> svn:type or svn:merge-mode or some such.
>> The value of this property is a keyword. Initialliy, the allowed values
>> would be "text" and "binary", but I can imagine adding "xml" in the
>> future if someone wants to add XML-aware merging (I've come across such
>> requests everal times, have used a similar feature in ClearCase to good
>> effect). svn:mime-type needn't go away or be deprecated; It's useful in
>> other contexts.
> That sounds like a fine idea, but a (how-to-merge?) property seems
> only tangentially related to (is-text?).

Actually, as far as Subversion is concerned, how-to-merge is the only
relevant question. Presentation of contents is someone else's problem
(most of the time).

> This is exactly the kind of
> overloading that I think should be avoided.

That is why I proposed that we retain svn:mime-type, since it is
possible to encode all the MIME attributes in that property, and thus
end up using that property to answer question about the file's contents,
whilst using a new one to answer questions about how to merge it.

> Yes, merging and diffing
> are definitely related concerns, but doing type-sensitive merging is a
> big change

Not really. We do type-sensitive merging today, albeit in a very limited
way. And I prefer to _keep_ it as limited as makes sense.

> and I think that it would be good not to introduce new
> properties to enable it until there's significant agreement on goals
> and how to stage rollout.

Sure. Hence this discussion.

-- Brane

Received on 2009-11-13 03:26:59 CET

This is an archived mail posted to the Subversion Dev mailing list.