[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Mark Phippard <markphip_at_gmail.com>
Date: Thu, 12 Nov 2009 20:43:30 -0500

On Thu, Nov 12, 2009 at 7:59 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
> 2009/11/12 Mark Phippard <markphip_at_gmail.com>:
>> On Thu, Nov 12, 2009 at 6:20 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>>> Conclusions from the svn:charset thread that Mark pointed out:
>>> (1) This proposal should not gate on svn:charset since it isn't yet
>>> recognized as official
>>> (2) We should avoid the term encoding in documentation of this feature.
>>> (3) There may be some bad interactions between ";charset=" in
>>> svn:mime-type and auto-props, but this proposal does not raise new
>>> issues, and those issues are a result of an error (possibly since
>>> fixed?) in auto-props.
>>> From the svn:charset thread:
>>> Much of the early debate deals with svn:charset being non-standard and
>>> non-approved.  I tend to agree with Stefan, that this proposal
>>> shouldn't gate on svn:charset being approved so I suggest tabling
>>> variant 1.
>> Correct me if I am wrong, but the only real goal we have right now is
>> to improve SVN's ability to tell itself "this is text" and I can do
>> textual merging?
> That is correct.
>> So why not just add an svn:text property that has a
>> value of '*'.  The presence of the property means "treat this as
>> text".
> To make sure I understand your counter-proposal, would a file be
> treated as text if at least one of (svn:mime-type starts with "text/"
> or matches the existing whitelist) OR (svn:text exists and is "*")?
> Or are you advocating dropping the first clause which is there for
> backwards-compatibility?

We would need to be backwards-compatible. Any new property would
exist so that a file with a mime-type of say application/xml could be
treated as text. But if there is no mime type of a text/* mime type
it should also still be treated as text.

>> My problem with charset is that it has implications that SVN does
>> something based on the charset.  For example, maybe it creates an
>> expectation that we validate the content of the file with the stated
>> charset, or that we can convert the content if you change the charset.
>>  Why use a property whose value has meaning if we do not do anything
>> with that meaning.  I do not think it makes sense to drag in hook
>> scripts or what other clients might do either, as there is nothing
>> stopping people from adding there own charset property.
> I think a new property is warranted to avoid overloading meaning of an
> existing one.
> I don't think this qualifies as overloading though.
> The svn:mime-type property is already linked to this determination,
> and for backwards compatibility that should not change.
> The concept of "is-textual" is linked to the "text/*" mime-type group,
> which the current implementation takes into account, and to the
> charset mime-type attribute in RFC 2046, which the current
> implementation does not take into account; so I view this as an
> attempt to fix an incomplete interpretation of an existing standard.

I just think it is more complicated than you think. For example, I
assume you are aware that SVN cannot perform textual merges
(currently) for UTF-16 or UTF-32 encoded files. So if you have an
svn:mime-type property of: application/xml;charset=utf16 then you have
to parse that and know to treat it as binary. So what do you if you
just get charset=foo or charset=xyz? I think this takes us in the
wrong direction.

Then there is the other class of issues I raised. Such as a file has
a value of charset=ascii but the file context is really UTF8 and
someone thinks we should validate the content.

>> So why not just make this simpler?  With an svn:text property you just
>> have to change any routine that determines if a file is text to look
>> for the presence of that property first, and then continue with the
>> other checks if it is not found.
> Assuming you advocate the backwards compatible option in my question
> above, I think multiplying properties unnecessarily is a move towards
> greater complexity.

I think we should go for something that is more specific. Branko's
ideas are fine too and would lay the ground work for expanding it in
the future.

Mark Phippard
Received on 2009-11-13 02:43:42 CET

This is an archived mail posted to the Subversion Dev mailing list.