[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn:charset

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Tue, 24 Jun 2008 15:18:55 -0400

Dag-Erling Smørgrav <des_at_des.no> writes:
> All I'm asking for is official sanction for the use of the svn:charset
> property to store the IANA character set name that corresponds to the
> file's encoding, independently of its media type. This is how it's
> being used today by several Open Source projects (a quick search
> uncovers Adium, Mono, Tartarus, Growl and Maccode, not counting projects
> I'm involved in) and by at least one third-party Subversion client
> (SmartSVN, based on SVNKit).
>
> One argument in favor of svn:charset, independently of the above, is
> that unlike the current trick of appending "; charset=%s" to the MIME
> type, it works with auto-props.

One advantage of appending to svn:mime-type is that when we serve out
the mime-type, those consumers that are prepared to handle a charset
addendum get it for free. If that information were in svn:charset
instead, then when we serve out the mime-type (say, over HTTP through
mod_dav_svn), would we want to "; " plus the value of svn:charset? If
so, what do we do when svn:mime-type already specifies a charset, and
it's not the same as what svn:charset specifies? (Or is the solution to
that to check at propset time, and try to avoid ever letting them
conflict on the same file?)

I'm not asking to be difficult, and I'm not opposed in principle to your
proposal. But there's more involved here than just giving "official
sanction". When the Subversion project gives official sanction by
putting something in the svn:namespace, we like it to mean we've thought
through the edge cases and handled them :-).

I wish the current svn:charset producers had not used the "svn:"
namespace, because now there may be compatibility concerns that we never
bargained for.

> I've attached a patch relative to trunk that:
>
> - adds svn:charset to svn_props.h;
> - adds it to the help text for propset;
> - updates the French and Norwegian translations accordingly (this
> doesn't seem to work, but they didn't work before I changed them, nor
> do they work in any other language I've tried);
> - modifies libsvn_wc to disallow svn:charset on non-file nodes, like it
> does for svn:mime-type;
> - modifies mod_dav_svn to take svn:charset into account when generating
> the Content-Encoding header.

Thank you for doing this work. It always helps to post a log message
along with the patch. I guess the above is the log message; we can
reformat it into the format specified by

   http://subversion.tigris.org/hacking.html#log-messages

(referenced from http://subversion.tigris.org/hacking.html#patches.)

Your patch uses TAB for indentation sometimes, and SPACE other times.
The TAB chars make the indendation be off due to quoting levels when
expanded inline in an email (such as this reply). It's no big deal, I
just mention it in case it's easy for you to use SPACE everywhere.

On to the substance of the patch:

> Index: subversion/mod_dav_svn/liveprops.c
> ===================================================================
> --- subversion/mod_dav_svn/liveprops.c (revision 31863)
> +++ subversion/mod_dav_svn/liveprops.c (working copy)
> @@ -434,6 +434,7 @@
> safe (and consistent) to assume the same on the server. */
> svn_string_t *pval;
> const char *mime_type = NULL;
> + const char *charset = NULL;
>
> if (resource->baselined && resource->type == DAV_RESOURCE_TYPE_VERSION)
> return DAV_PROP_INSERT_NOTSUPP;
> @@ -478,9 +479,27 @@
> svn_error_clear(serr);
> return DAV_PROP_INSERT_NOTDEF;
> }
> +
> + if ((serr = svn_fs_node_prop(&pval, resource->info->root.root,
> + resource->info->repos_path,
> + SVN_PROP_CHARSET, p)))
> + {
> + svn_error_clear(serr);
> + pval = NULL;
> + }
> +
> + if (pval)
> + charset = pval->data;
> }
>
> - value = mime_type;
> + if (charset != NULL)
> + {
> + value = apr_psprintf(p, "%s; charset=%s", mime_type, charset);
> + }
> + else
> + {
> + value = mime_type;
> + }
> break;
> }

This doesn't handle the case where mime_type already has an appended
charset.

> Index: subversion/svn/main.c
> ===================================================================
> --- subversion/svn/main.c (revision 31863)
> +++ subversion/svn/main.c (working copy)
> @@ -747,6 +747,7 @@
> " whether to merge the file, and how to serve it from Apache.\n"
> " A mimetype beginning with 'text/' (or an absent mimetype) is\n"
> " treated as text. Anything else is treated as binary.\n"
> + " svn:charset - the character encoding of the file.\n"
> " svn:externals - A newline separated list of module specifiers,\n"
> " each of which consists of a relative directory path, optional\n"
> " revision flags and an URL. The ordering of the three elements\n"

Need to specify the namespace for encodings (i.e., whatever the official
way to refer to the IANA list is).

> Index: subversion/include/svn_props.h
> ===================================================================
> --- subversion/include/svn_props.h (revision 31863)
> +++ subversion/include/svn_props.h (working copy)
> @@ -235,6 +235,9 @@
> /** The mime-type of a given file. */
> #define SVN_PROP_MIME_TYPE SVN_PROP_PREFIX "mime-type"
>
> +/** The character encoding of a given file. */
> +#define SVN_PROP_CHARSET SVN_PROP_PREFIX "charset"
> +
> /** The ignore patterns for a given directory. */
> #define SVN_PROP_IGNORE SVN_PROP_PREFIX "ignore"

By the way, is "charset" the standard word for this? I know we use it
informally this way, but as character set and encoding can sometimes be
different, there might be a more formally correct term. Thoughts?

I haven't looked through the code to see if there are other spots that
would be affected by this.

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-06-24 21:19:38 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.