Mike Samuel wrote:
> 2009/11/12 Branko Čibej <brane_at_xbc.nu>:
>> Mike Samuel wrote:
>>> 2009/11/12 Branko Čibej <brane_at_xbc.nu>:
>>>> What that amounts to is that, for completeness and correctness, you'd
>>>> have to carefully catalogue all the media types and applicable
>>>> attributes, and make complex decisions based on that data; *and* keep it
>>>> up to date, which apparently is not as easy as one would like (e.g., see
>>>> http://roy.gbiv.com/untangled/2009/wrangling-mimetypes). I don't think
>>>> the goal would justify the ongoing effort involved.
>>> Why? Why not just take into account the charset mime-type parameter
>>> which can only be present on texty types?
>>> I'm not suggesting this as an ultimate solution, just an incremental
>>> improvement of an existing feature that is already well documented and
>>> understood in other domains.
>> What do you do if the charset attribute isn't present? You either fall
>> back to the current "broken" behaviour, or you interpret the media type.
> The current proposal outlined in the first mail of this thread is to
> classify a file as texty if
> (1) the current "broken" behavior says so
> (2) OR if there is a charset attribute present regardless of value.
Your proposal and Mark's only differ in extended the parsing of
svn:mime-type vs. introducing a new property. Mine adds the option of
having more fine-grained choices about diff algorithms in the future
without actually having to know anything about specific media types, but
that's just a "future-proof" not an immediate requirement.
On the surface, the choice appears simple: roll a die, or flip a coin,
or (heh) have a duel. But Hyrum makes a very good example of the working
copy library nightmare. Even picking "strcmp" over "regex.match" can
have far-reaching consequences, but I'm more concerned about the
/potential/ creature feep: "Oh, we do /this/ and /this/ with
svn:mime-type, why not check it for colour and taste, too." Somewhat
overdone, perhaps, but it's /so/ easy to fall into the trap of just
adding a little something to an existing feature.
>> (Oh, we don't properly interpret the Unicode end-of-line code point in
>> UTF-8 files.)
> End of line codepoint? Are you talking about U+2028 and U+2029?
Those would appear to be the ones, yes. I don't know offhand if the
paragraph separator implies end-of-line.
Received on 2009-11-13 04:27:44 CET