[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn:mime-type arbitrary parameters

From: Sean Leonard <dev+ietf_at_seantek.com>
Date: Sat, 1 Nov 2014 04:22:10 -0700

On Oct 30, 2014, at 3:52 AM, Branko Čibej <brane_at_wandisco.com> wrote:

> On 30.10.2014 05:47, Sean Leonard wrote:
>> On Oct 26, 2014, at 11:35 AM, Branko Čibej <brane_at_wandisco.com> wrote:
>>
>>> On 26.10.2014 16:06, Sean Leonard wrote:
>>>> On 10/26/2014 3:21 AM, Branko Čibej wrote:
>>>>> On 26.10.2014 05:49, Sean Leonard wrote:
>>>>>> On 10/25/2014 5:59 PM, Branko Čibej wrote:
>>>>>>> On 25.10.2014 20:53, Sean Leonard wrote:
>>>>>>>> It appears that the matter was not fully resolved. svn:charset seems
>>>>>>>> to enjoy de-facto use.
>>>>>>> If anyone is using svn:charset, they're violating our rules. The svn:
>>>>>>> namespace is reserved for property names defined by Subversion, and
>>>>>>> we've not defined that name. So ... using that name is likely to cause
>>>>>>> problems at some point.
>>>>>> Ok. So I guess the issue of how Subversion encodes a particular
>>>>>> character set/character encoding is still "live"?
>>>>> Well, Subversion doesn't "encode" anything; but for the purpose of
>>>>> serving content straight from the repository through an HTTP server, the
>>>>> established way to define the character set is to add the tag to the
>>>>> svn:mime-type property, e.g.:
>>>>>
>>>>> svn propset svn:mime-type 'text/plain; charset=UTF-8' file...
>>>> Actually I have a different proposal: what if the property name is the
>>>> parameter, prefixed with "svn:mime-type:", and the property value is
>>>> the UTF-8 encoded parameter value?
>>>>
>>>> For example:
>>> [...]
>>>
>>> You must be confusing Subversion with some Web content management system. :)
>> Well, Subversion is a content management system…at least for source code. :) And that includes source code for websites. Thus in that sense, it would be a Web content management system. :)
>>
>> Internally one of my projects is using it for document storage, and it has been working out pretty well. Much cheaper than document management systems that cost hundreds of thousands of dollars.
>>
>>> The fact that the svn:mime-type property is usable in any way for
>>> serving content from the repository is more or less an accident; it's
>>> definitely not a design goal. What you propose would have zero benefit
>>> for Subversion as a version control system but less than trivial
>>> maintenance costs, so it's not likely to ever happen.
>>>
>>>> Are there other places where this property is parsed or interpreted?
>>> It's used by mod_dav_svn to populate Content-Type; but, as I said,
>>> that's not the purpose of the property.
>> Well I don’t know if has “zero" benefit. The benefit is that it is easier to retrieve and manipulate the semantic values without needing to do intricate parsing of svn:mime-type.
>>
>> However, it seems that there is running code that will pass the parameters as-is to the Content-Type field, so this is sufficient reason for me (on top of the “; “ delimiter check in validate.c) to conclude that, if you want to store media type parameters in Subversion, you should do it with RFC 2045 style semicolon delimited data after the media type in the “svn:mime-type” property. Thanks!
>>
>>>> OK. How does Subversion restrict the value to US-ASCII?
>>> It turns out I was wrong about that; we don't restrict the value to
>>> US-ASCII. See svn_mime_type_validate in subversion/lbsvn_subr/validate.c.
>> Ok, thanks. Yep, that settles it regarding the parameters—there is a comment in there.
>>
>> On the other hand, the code contradicts what you’re saying:
>> /* Check the mime type for illegal characters. See RFC 1521. */
>> for (i = 0; i < len; i++)
>> {
>> if (&mime_type[i] != slash_pos
>> && (! svn_ctype_isascii(mime_type[i])
>> || svn_ctype_iscntrl(mime_type[i])
>> || svn_ctype_isspace(mime_type[i])
>> || (strchr(tspecials, mime_type[i]) != NULL)))
>> return svn_error_createf
>>
>>
>> That suggests that the characters are limited to US-ASCII.
>
> Meh ... you're right and I'm stupid these days.
>
>> While this may be true for RFC 2045, e-mail can how contain UTF-8 headers. (RFC 6530.) It is not yet clear, however, whether UTF-8 headers apply to MIME headers, specifically Content-Type. Also, the code above is not *entirely* correct as a Content-Type header can contain linear whitespace (LWSP), meaning that \r\n(spaces)(more content) is permissible—it gets collapsed to a single space.
>
> "May contain" is not the same as "must contain". The svn:mime-type
> property is not equivalent to the Content-Type header and it's a mistake
> to assume it is; e.g., as you note, it may not contain newlines, but
> that does not prevent us from populating Content-Type from
> svn:mime-type. That is a one-way path; we don't ever try to set
> svn:mime-type from the value of some arbitrary Content-Type header.
>
> After reading RFCs 6530 and 6532, I conclude that Content-Type can now
> contain UTF-8 since neither document explicitly forbids that. However,
> we can't extend the domain of svn:mime-type property values because we
> must maintain backwards compatibility.
>
> So, yes, you're restricted to using quoting kluges if you want to embed
> Unicode characters in the value. You'll also note that you can't use
> comments in svn:mime-type, because we forbid parentheses.
>
> All that said, I'm sure we'd consider a client-side patch that would
> allow users to use UTF-8 when setting svn:mime-type in the client, or
> via the svn_client API, and would do the necessary quoting and unquoting
> transparently. Do you think you're up to having a go at producing such a
> patch?

Yes, I am willing to work on that.

I will probably not have time for the next couple of weeks, but will try to allocate time in November.

(If anyone wants to take it in the interim, go ahead. Otherwise I’ll see what I can do later this month.)

Best regards,

Sean
Received on 2014-11-01 14:09:22 CET

This is an archived mail posted to the Subversion Dev mailing list.