On 30.10.2014 05:47, Sean Leonard wrote:
> On Oct 26, 2014, at 11:35 AM, Branko Čibej <brane_at_wandisco.com> wrote:
>
>> On 26.10.2014 16:06, Sean Leonard wrote:
>>> On 10/26/2014 3:21 AM, Branko Čibej wrote:
>>>> On 26.10.2014 05:49, Sean Leonard wrote:
>>>>> On 10/25/2014 5:59 PM, Branko Čibej wrote:
>>>>>> On 25.10.2014 20:53, Sean Leonard wrote:
>>>>>>> It appears that the matter was not fully resolved. svn:charset seems
>>>>>>> to enjoy de-facto use.
>>>>>> If anyone is using svn:charset, they're violating our rules. The svn:
>>>>>> namespace is reserved for property names defined by Subversion, and
>>>>>> we've not defined that name. So ... using that name is likely to cause
>>>>>> problems at some point.
>>>>> Ok. So I guess the issue of how Subversion encodes a particular
>>>>> character set/character encoding is still "live"?
>>>> Well, Subversion doesn't "encode" anything; but for the purpose of
>>>> serving content straight from the repository through an HTTP server, the
>>>> established way to define the character set is to add the tag to the
>>>> svn:mime-type property, e.g.:
>>>>
>>>> svn propset svn:mime-type 'text/plain; charset=UTF-8' file...
>>> Actually I have a different proposal: what if the property name is the
>>> parameter, prefixed with "svn:mime-type:", and the property value is
>>> the UTF-8 encoded parameter value?
>>>
>>> For example:
>> [...]
>>
>> You must be confusing Subversion with some Web content management system. :)
> Well, Subversion is a content management system…at least for source code. :) And that includes source code for websites. Thus in that sense, it would be a Web content management system. :)
>
> Internally one of my projects is using it for document storage, and it has been working out pretty well. Much cheaper than document management systems that cost hundreds of thousands of dollars.
>
>> The fact that the svn:mime-type property is usable in any way for
>> serving content from the repository is more or less an accident; it's
>> definitely not a design goal. What you propose would have zero benefit
>> for Subversion as a version control system but less than trivial
>> maintenance costs, so it's not likely to ever happen.
>>
>>> Are there other places where this property is parsed or interpreted?
>> It's used by mod_dav_svn to populate Content-Type; but, as I said,
>> that's not the purpose of the property.
> Well I don’t know if has “zero" benefit. The benefit is that it is easier to retrieve and manipulate the semantic values without needing to do intricate parsing of svn:mime-type.
>
> However, it seems that there is running code that will pass the parameters as-is to the Content-Type field, so this is sufficient reason for me (on top of the “; “ delimiter check in validate.c) to conclude that, if you want to store media type parameters in Subversion, you should do it with RFC 2045 style semicolon delimited data after the media type in the “svn:mime-type” property. Thanks!
>
>>> OK. How does Subversion restrict the value to US-ASCII?
>> It turns out I was wrong about that; we don't restrict the value to
>> US-ASCII. See svn_mime_type_validate in subversion/lbsvn_subr/validate.c.
> Ok, thanks. Yep, that settles it regarding the parameters—there is a comment in there.
>
> On the other hand, the code contradicts what you’re saying:
> /* Check the mime type for illegal characters. See RFC 1521. */
> for (i = 0; i < len; i++)
> {
> if (&mime_type[i] != slash_pos
> && (! svn_ctype_isascii(mime_type[i])
> || svn_ctype_iscntrl(mime_type[i])
> || svn_ctype_isspace(mime_type[i])
> || (strchr(tspecials, mime_type[i]) != NULL)))
> return svn_error_createf
>
>
> That suggests that the characters are limited to US-ASCII.
Meh ... you're right and I'm stupid these days.
> While this may be true for RFC 2045, e-mail can how contain UTF-8 headers. (RFC 6530.) It is not yet clear, however, whether UTF-8 headers apply to MIME headers, specifically Content-Type. Also, the code above is not *entirely* correct as a Content-Type header can contain linear whitespace (LWSP), meaning that \r\n(spaces)(more content) is permissible—it gets collapsed to a single space.
"May contain" is not the same as "must contain". The svn:mime-type
property is not equivalent to the Content-Type header and it's a mistake
to assume it is; e.g., as you note, it may not contain newlines, but
that does not prevent us from populating Content-Type from
svn:mime-type. That is a one-way path; we don't ever try to set
svn:mime-type from the value of some arbitrary Content-Type header.
After reading RFCs 6530 and 6532, I conclude that Content-Type can now
contain UTF-8 since neither document explicitly forbids that. However,
we can't extend the domain of svn:mime-type property values because we
must maintain backwards compatibility.
So, yes, you're restricted to using quoting kluges if you want to embed
Unicode characters in the value. You'll also note that you can't use
comments in svn:mime-type, because we forbid parentheses.
All that said, I'm sure we'd consider a client-side patch that would
allow users to use UTF-8 when setting svn:mime-type in the client, or
via the svn_client API, and would do the necessary quoting and unquoting
transparently. Do you think you're up to having a go at producing such a
patch?
-- Brane
Received on 2014-10-30 11:52:40 CET