Re: Classifying files as binary or text

From: Mike Samuel <mikesamuel_at_gmail.com>
Date: Thu, 12 Nov 2009 15:20:00 -0800

Conclusions from the svn:charset thread that Mark pointed out:
(1) This proposal should not gate on svn:charset since it isn't yet
recognized as official
(2) We should avoid the term encoding in documentation of this feature.
(3) There may be some bad interactions between ";charset=" in
svn:mime-type and auto-props, but this proposal does not raise new
issues, and those issues are a result of an error (possibly since
fixed?) in auto-props.

From the svn:charset thread:

Much of the early debate deals with svn:charset being non-standard and
non-approved. I tend to agree with Stefan, that this proposal
shouldn't gate on svn:charset being approved so I suggest tabling
variant 1.

http://svn.haxx.se/dev/archive-2008-06/0948.shtml
    One argument in favor of svn:charset, independently of the above, is
    that unlike the current trick of appending "; charset=%s" to the MIME
    type, it works with auto-props.
http://svn.haxx.se/dev/archive-2008-06/0983.shtml concludes the
problem is with auto-props. I tend to agree and even if auto-props
still have problems with spaces, the issue is separable from this
proposed change.

http://svn.haxx.se/dev/archive-2008-06/0962.shtml
Mentions confusion between character set and encoding.

This is valid. My understanding is that
* a character set is a mapping from byte strings to code-point strings
* a content encoding is a mapping from byte strings to strings in some
other token set, so that they can be unpacked at the other end into a
byte array.
The former assumes that the data being relayed is textual. The latter
assumes that the data being relayed is binary and that it is being
converted so that it fits into an envelope that uses a particular
token set.
The relevant standards already specify that the charset mime-type
parameter is the former, and so any documentation of this feature
should reference that and avoid using the term "encoding."

2009/11/12 Mike Samuel <mikesamuel_at_gmail.com>:
> 2009/11/12 Mark Phippard <markphip_at_gmail.com>:
>> On Thu, Nov 12, 2009 at 5:02 PM, Mike Samuel <mikesamuel_at_gmail.com> wrote:
>>> Variant 1:
>>> Append the criteria above with
>>> (3) Use the charset from svn:charset if there is none from (2)
>>> See http://svn.haxx.se/dev/archive-2008-06/0941.shtml
>>
>> When I suggested you look at that thread it was not so much for you to
>> include it in this proposal as to read the discussion. It seemed like
>> the discussion was all relevant to this (and I seem to recall the
>> usage of svn:charset was not considered the way to do this).
>>
>> For the most part I like your idea. I did not completely re-read that
>> other thread either but I seem to recall there were some legitimate
>> concerns raised that you might need to address.
>
> Ah, I will read through that thread and try to summarize the issues here.
>
>> --
>> Thanks
>>
>> Mark Phippard
>> http://markphip.blogspot.com/
>>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2417309
Received on 2009-11-13 00:20:11 CET

This message: [ Message body ]
Next message: Mike Samuel: "Re: Classifying files as binary or text"
Previous message: Jack Repenning: "Re: [RFC] mailing list host"
In reply to: Mike Samuel: "Re: Classifying files as binary or text"
Next in thread: Stefan Sperling: "Re: Classifying files as binary or text"
Reply: Stefan Sperling: "Re: Classifying files as binary or text"
Reply: Mark Phippard: "Re: Classifying files as binary or text"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]