Re: svn:charset

From: Dag-Erling Smørgrav <des_at_des.no>
Date: Wed, 25 Jun 2008 10:07:48 +0200

Karl Fogel <kfogel_at_red-bean.com> writes:
> One advantage of appending to svn:mime-type is that when we serve out
> the mime-type, those consumers that are prepared to handle a charset
> addendum get it for free.

Yes, but it doesn't work with auto-props.

> If that information were in svn:charset
> instead, then when we serve out the mime-type (say, over HTTP through
> mod_dav_svn), would we want to "; " plus the value of svn:charset?

Yes, this is what my patch does.

> If
> so, what do we do when svn:mime-type already specifies a charset, and
> it's not the same as what svn:charset specifies?

AFAIK, the client will ignore the first one.

> (Or is the solution to
> that to check at propset time, and try to avoid ever letting them
> conflict on the same file?)

I'd rather check at propset time that the contents of svn:mime-type
match /^[[:alnum:]-]+/[[:alnum:]-]+$/...

> > I've attached a patch relative to trunk that:
> >
> > - adds svn:charset to svn_props.h;
> > - adds it to the help text for propset;
> > - updates the French and Norwegian translations accordingly (this
> > doesn't seem to work, but they didn't work before I changed them, nor
> > do they work in any other language I've tried);
> > - modifies libsvn_wc to disallow svn:charset on non-file nodes, like it
> > does for svn:mime-type;
> > - modifies mod_dav_svn to take svn:charset into account when generating
> > the Content-Encoding header.
> Thank you for doing this work. It always helps to post a log message
> along with the patch.

I expect there will be changes before it's committed, anyway.

> Your patch uses TAB for indentation sometimes, and SPACE other times.
> The TAB chars make the indendation be off due to quoting levels when
> expanded inline in an email (such as this reply). It's no big deal, I
> just mention it in case it's easy for you to use SPACE everywhere.

It's just a matter of telling Emacs to DTRT.

> This doesn't handle the case where mime_type already has an appended
> charset.

See above; but it should be trivial to strip everything after the
semicolon in svn:mime-type (if and only if svn:charset is present)

> Need to specify the namespace for encodings (i.e., whatever the official
> way to refer to the IANA list is).

http://www.iana.org/assignments/character-sets

In practice, most systems only know a subset of these; in particular,
most systems don't know all the names for each character set. The
correct name for iso-8859-1, for instance, is ISO_8859-1:1987, but the
former is the preferred name for MIME.

BTW, I suspect the reason why translation doesn't work is that the
message IDs are too long. If I were you, I'd use short symbolic message
IDs (e.g. "SVN_HELP_PROPSET_LONG") and place the full text in an en.po
file.

> By the way, is "charset" the standard word for this? I know we use it
> informally this way, but as character set and encoding can sometimes be
> different, there might be a more formally correct term. Thoughts?

Strictly speaking, Unicode is a character set while iso-8859-1 is a
character encoding of a specific subset of Unicode, but historically,
iso8859-1 and the like have been called character sets, and "charset" is
what MIME uses.

DES

-- 
Dag-Erling Smørgrav - des_at_des.no
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-06-25 10:08:11 CEST

This message: [ Message body ]
Next message: Jens Seidel: "Re: Merge doesn't honour specified SOURCE"
Previous message: Hyrum K. Wright: "Re: Bugreport for subversion"
In reply to: Karl Fogel: "Re: svn:charset"
Next in thread: Alan Barrett: "Re: svn:charset"
Reply: Alan Barrett: "Re: svn:charset"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]