On Oct 26, 2014, at 11:35 AM, Branko Čibej <brane_at_wandisco.com> wrote:
> On 26.10.2014 16:06, Sean Leonard wrote:
>> On 10/26/2014 3:21 AM, Branko Čibej wrote:
>>> On 26.10.2014 05:49, Sean Leonard wrote:
>>>> On 10/25/2014 5:59 PM, Branko Čibej wrote:
>>>>> On 25.10.2014 20:53, Sean Leonard wrote:
>>>>>> It appears that the matter was not fully resolved. svn:charset seems
>>>>>> to enjoy de-facto use.
>>>>> If anyone is using svn:charset, they're violating our rules. The svn:
>>>>> namespace is reserved for property names defined by Subversion, and
>>>>> we've not defined that name. So ... using that name is likely to cause
>>>>> problems at some point.
>>>> Ok. So I guess the issue of how Subversion encodes a particular
>>>> character set/character encoding is still "live"?
>>> Well, Subversion doesn't "encode" anything; but for the purpose of
>>> serving content straight from the repository through an HTTP server, the
>>> established way to define the character set is to add the tag to the
>>> svn:mime-type property, e.g.:
>>> svn propset svn:mime-type 'text/plain; charset=UTF-8' file...
>> Actually I have a different proposal: what if the property name is the
>> parameter, prefixed with "svn:mime-type:", and the property value is
>> the UTF-8 encoded parameter value?
>> For example:
> You must be confusing Subversion with some Web content management system. :)
Well, Subversion is a content management system…at least for source code. :) And that includes source code for websites. Thus in that sense, it would be a Web content management system. :)
Internally one of my projects is using it for document storage, and it has been working out pretty well. Much cheaper than document management systems that cost hundreds of thousands of dollars.
> The fact that the svn:mime-type property is usable in any way for
> serving content from the repository is more or less an accident; it's
> definitely not a design goal. What you propose would have zero benefit
> for Subversion as a version control system but less than trivial
> maintenance costs, so it's not likely to ever happen.
>> Are there other places where this property is parsed or interpreted?
> It's used by mod_dav_svn to populate Content-Type; but, as I said,
> that's not the purpose of the property.
Well I don’t know if has “zero" benefit. The benefit is that it is easier to retrieve and manipulate the semantic values without needing to do intricate parsing of svn:mime-type.
However, it seems that there is running code that will pass the parameters as-is to the Content-Type field, so this is sufficient reason for me (on top of the “; “ delimiter check in validate.c) to conclude that, if you want to store media type parameters in Subversion, you should do it with RFC 2045 style semicolon delimited data after the media type in the “svn:mime-type” property. Thanks!
>> OK. How does Subversion restrict the value to US-ASCII?
> It turns out I was wrong about that; we don't restrict the value to
> US-ASCII. See svn_mime_type_validate in subversion/lbsvn_subr/validate.c.
Ok, thanks. Yep, that settles it regarding the parameters—there is a comment in there.
On the other hand, the code contradicts what you’re saying:
/* Check the mime type for illegal characters. See RFC 1521. */
for (i = 0; i < len; i++)
if (&mime_type[i] != slash_pos
&& (! svn_ctype_isascii(mime_type[i])
|| (strchr(tspecials, mime_type[i]) != NULL)))
That suggests that the characters are limited to US-ASCII.
While this may be true for RFC 2045, e-mail can how contain UTF-8 headers. (RFC 6530.) It is not yet clear, however, whether UTF-8 headers apply to MIME headers, specifically Content-Type. Also, the code above is not *entirely* correct as a Content-Type header can contain linear whitespace (LWSP), meaning that \r\n(spaces)(more content) is permissible—it gets collapsed to a single space.
Received on 2014-10-30 05:49:19 CET