[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Mark Phippard <markphip_at_gmail.com>
Date: Thu, 12 Nov 2009 20:43:30 -0500

On Thu, Nov 12, 2009 at 7:59 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
> 2009/11/12 Mark Phippard <markphip_at_gmail.com>:
>> On Thu, Nov 12, 2009 at 6:20 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>>> Conclusions from the svn:charset thread that Mark pointed out:
>>> (1) This proposal should not gate on svn:charset since it isn't yet
>>> recognized as official
>>> (2) We should avoid the term encoding in documentation of this feature.
>>> (3) There may be some bad interactions between ";charset=" in
>>> svn:mime-type and auto-props, but this proposal does not raise new
>>> issues, and those issues are a result of an error (possibly since
>>> fixed?) in auto-props.
>>>
>>>
>>> From the svn:charset thread:
>>>
>>> Much of the early debate deals with svn:charset being non-standard and
>>> non-approved.  I tend to agree with Stefan, that this proposal
>>> shouldn't gate on svn:charset being approved so I suggest tabling
>>> variant 1.
>>
>> Correct me if I am wrong, but the only real goal we have right now is
>> to improve SVN's ability to tell itself "this is text" and I can do
>> textual merging?
>
> That is correct.
>
>
>> So why not just add an svn:text property that has a
>> value of '*'.  The presence of the property means "treat this as
>> text".
>
> To make sure I understand your counter-proposal, would a file be
> treated as text if at least one of (svn:mime-type starts with "text/"
> or matches the existing whitelist) OR (svn:text exists and is "*")?
>
> Or are you advocating dropping the first clause which is there for
> backwards-compatibility?

We would need to be backwards-compatible. Any new property would
exist so that a file with a mime-type of say application/xml could be
treated as text. But if there is no mime type of a text/* mime type
it should also still be treated as text.

>> My problem with charset is that it has implications that SVN does
>> something based on the charset.  For example, maybe it creates an
>> expectation that we validate the content of the file with the stated
>> charset, or that we can convert the content if you change the charset.
>>  Why use a property whose value has meaning if we do not do anything
>> with that meaning.  I do not think it makes sense to drag in hook
>> scripts or what other clients might do either, as there is nothing
>> stopping people from adding there own charset property.
>
> I think a new property is warranted to avoid overloading meaning of an
> existing one.
> I don't think this qualifies as overloading though.
> The svn:mime-type property is already linked to this determination,
> and for backwards compatibility that should not change.
> The concept of "is-textual" is linked to the "text/*" mime-type group,
> which the current implementation takes into account, and to the
> charset mime-type attribute in RFC 2046, which the current
> implementation does not take into account; so I view this as an
> attempt to fix an incomplete interpretation of an existing standard.

I just think it is more complicated than you think. For example, I
assume you are aware that SVN cannot perform textual merges
(currently) for UTF-16 or UTF-32 encoded files. So if you have an
svn:mime-type property of: application/xml;charset=utf16 then you have
to parse that and know to treat it as binary. So what do you if you
just get charset=foo or charset=xyz? I think this takes us in the
wrong direction.

Then there is the other class of issues I raised. Such as a file has
a value of charset=ascii but the file context is really UTF8 and
someone thinks we should validate the content.

>> So why not just make this simpler?  With an svn:text property you just
>> have to change any routine that determines if a file is text to look
>> for the presence of that property first, and then continue with the
>> other checks if it is not found.
>
> Assuming you advocate the backwards compatible option in my question
> above, I think multiplying properties unnecessarily is a move towards
> greater complexity.

I think we should go for something that is more specific. Branko's
ideas are fine too and would lay the ground work for expanding it in
the future.

-- 
Thanks
Mark Phippard
http://markphip.blogspot.com/
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2417348
Received on 2009-11-13 02:43:42 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.