On Thu, Nov 12, 2009 at 7:59 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
> 2009/11/12 Mark Phippard <markphip_at_gmail.com>:
>> On Thu, Nov 12, 2009 at 6:20 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>>> Conclusions from the svn:charset thread that Mark pointed out:
>>> (1) This proposal should not gate on svn:charset since it isn't yet
>>> recognized as official
>>> (2) We should avoid the term encoding in documentation of this feature.
>>> (3) There may be some bad interactions between ";charset=" in
>>> svn:mime-type and auto-props, but this proposal does not raise new
>>> issues, and those issues are a result of an error (possibly since
>>> fixed?) in auto-props.
>>> From the svn:charset thread:
>>> Much of the early debate deals with svn:charset being non-standard and
>>> non-approved. I tend to agree with Stefan, that this proposal
>>> shouldn't gate on svn:charset being approved so I suggest tabling
>>> variant 1.
>> Correct me if I am wrong, but the only real goal we have right now is
>> to improve SVN's ability to tell itself "this is text" and I can do
>> textual merging?
> That is correct.
>> So why not just add an svn:text property that has a
>> value of '*'. The presence of the property means "treat this as
> To make sure I understand your counter-proposal, would a file be
> treated as text if at least one of (svn:mime-type starts with "text/"
> or matches the existing whitelist) OR (svn:text exists and is "*")?
> Or are you advocating dropping the first clause which is there for
We would need to be backwards-compatible. Any new property would
exist so that a file with a mime-type of say application/xml could be
treated as text. But if there is no mime type of a text/* mime type
it should also still be treated as text.
>> My problem with charset is that it has implications that SVN does
>> something based on the charset. For example, maybe it creates an
>> expectation that we validate the content of the file with the stated
>> charset, or that we can convert the content if you change the charset.
>> Why use a property whose value has meaning if we do not do anything
>> with that meaning. I do not think it makes sense to drag in hook
>> scripts or what other clients might do either, as there is nothing
>> stopping people from adding there own charset property.
> I think a new property is warranted to avoid overloading meaning of an
> existing one.
> I don't think this qualifies as overloading though.
> The svn:mime-type property is already linked to this determination,
> and for backwards compatibility that should not change.
> The concept of "is-textual" is linked to the "text/*" mime-type group,
> which the current implementation takes into account, and to the
> charset mime-type attribute in RFC 2046, which the current
> implementation does not take into account; so I view this as an
> attempt to fix an incomplete interpretation of an existing standard.
I just think it is more complicated than you think. For example, I
assume you are aware that SVN cannot perform textual merges
(currently) for UTF-16 or UTF-32 encoded files. So if you have an
svn:mime-type property of: application/xml;charset=utf16 then you have
to parse that and know to treat it as binary. So what do you if you
just get charset=foo or charset=xyz? I think this takes us in the
Then there is the other class of issues I raised. Such as a file has
a value of charset=ascii but the file context is really UTF8 and
someone thinks we should validate the content.
>> So why not just make this simpler? With an svn:text property you just
>> have to change any routine that determines if a file is text to look
>> for the presence of that property first, and then continue with the
>> other checks if it is not found.
> Assuming you advocate the backwards compatible option in my question
> above, I think multiplying properties unnecessarily is a move towards
> greater complexity.
I think we should go for something that is more specific. Branko's
ideas are fine too and would lay the ground work for expanding it in
Received on 2009-11-13 02:43:42 CET