[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Classifying files as binary or text

From: Mike Samuel <mikesamuel_at_gmail.com>
Date: Thu, 12 Nov 2009 16:59:55 -0800

2009/11/12 Mark Phippard <markphip_at_gmail.com>:
> On Thu, Nov 12, 2009 at 6:20 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>> Conclusions from the svn:charset thread that Mark pointed out:
>> (1) This proposal should not gate on svn:charset since it isn't yet
>> recognized as official
>> (2) We should avoid the term encoding in documentation of this feature.
>> (3) There may be some bad interactions between ";charset=" in
>> svn:mime-type and auto-props, but this proposal does not raise new
>> issues, and those issues are a result of an error (possibly since
>> fixed?) in auto-props.
>>
>>
>> From the svn:charset thread:
>>
>> Much of the early debate deals with svn:charset being non-standard and
>> non-approved.  I tend to agree with Stefan, that this proposal
>> shouldn't gate on svn:charset being approved so I suggest tabling
>> variant 1.
>
> Correct me if I am wrong, but the only real goal we have right now is
> to improve SVN's ability to tell itself "this is text" and I can do
> textual merging?

That is correct.

> So why not just add an svn:text property that has a
> value of '*'.  The presence of the property means "treat this as
> text".

To make sure I understand your counter-proposal, would a file be
treated as text if at least one of (svn:mime-type starts with "text/"
or matches the existing whitelist) OR (svn:text exists and is "*")?

Or are you advocating dropping the first clause which is there for
backwards-compatibility?

> My problem with charset is that it has implications that SVN does
> something based on the charset.  For example, maybe it creates an
> expectation that we validate the content of the file with the stated
> charset, or that we can convert the content if you change the charset.
>  Why use a property whose value has meaning if we do not do anything
> with that meaning.  I do not think it makes sense to drag in hook
> scripts or what other clients might do either, as there is nothing
> stopping people from adding there own charset property.

I think a new property is warranted to avoid overloading meaning of an
existing one.
I don't think this qualifies as overloading though.
The svn:mime-type property is already linked to this determination,
and for backwards compatibility that should not change.
The concept of "is-textual" is linked to the "text/*" mime-type group,
which the current implementation takes into account, and to the
charset mime-type attribute in RFC 2046, which the current
implementation does not take into account; so I view this as an
attempt to fix an incomplete interpretation of an existing standard.

> So why not just make this simpler?  With an svn:text property you just
> have to change any routine that determines if a file is text to look
> for the presence of that property first, and then continue with the
> other checks if it is not found.

Assuming you advocate the backwards compatible option in my question
above, I think multiplying properties unnecessarily is a move towards
greater complexity.

> --
> Thanks
>
> Mark Phippard
> http://markphip.blogspot.com/
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2417339
Received on 2009-11-13 02:00:10 CET

This is an archived mail posted to the Subversion Dev mailing list.