[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Missing LOCALE in post-commit hook leads to weird behaviour of `svnlook log` with unicode characters – broken transliterations

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 29 Jan 2018 09:50:09 +0100

On Sat, Jan 27, 2018 at 6:35 PM, H.-Dirk Schmitt <dirk_at_computer42.org> wrote:
> I found a very weird behaviour of `svnlook log` that IMHO is a bug (or
> at least a serious missing documentation issue).
> Introduction
> ------------
> Consider a log message like: 'Unicode Test → ø ÄÖÜ'
> `svnlook log` invoked in a normal terminal session shows the proper
> content.
> This works because the environment is set to 'en_US.UTF-8'.
> Now start to play - `env LC_ALL=C.UTF-8 svnlook log` also shows a
> correct result.
> Problem
> -------
> But falling back to `env LC_ALL=C svnlook log` I got a very flawed
> result:
> Unicode Test {U+2192} {U+00F8} AOU
> → and ø are replaced with there code description
> The German Umlaut chars are translitterated in a very uncommon way.
> In the old ASCII/type-writer days Ä was translitterated in Ae (Ö → Oe,
> …)
> Why is this behaviour not a cosmetic problem.
> ---------------------------------------------
> Consider a post-commit hook fetching the commit message with `svnlook
> log`.
> Purpose is to postprocess the log message content, e.g. append to
> bugzilla issues.
> The actual setup is svn+apache2 and a bash script as post commit hook.
> The machine locatle as reported by `localectl`: System Locale:
> LANG=en_US.utf8
> All the commit messages content transfered is broken as described
> above.
> This happens because the post-commit hook is running with a very
> reduced set of environment variables:
> PWD=/
> Especially `LC_ALL` is not set which is eqivalent to `LC_ALL=C`.
> Suggested Mitigation/Fixing
> ---------------------------
> 1. Subversion should ensure that the system locale is forwarded to the
> post-commit hook.
> 2. `svnlook` shoud support the `--encoding` switch
> 3. German Umlaute (and surely some other national characters in the 8-
> bit range) shouldn't translittered in a different
> way as unicode characters (see ø / {U+00F8}).
> PS: Google et. al. haven't shown that this issue is well documented.

This is documented in the official documentation (the "SVN Book"):

(see the first sentence there: "By default, Subversion executes hook
scripts with an empty environment—that is, no environment variables
are set at all, not even $PATH (or %PATH%, under Windows).")

Received on 2018-01-29 09:50:43 CET

This is an archived mail posted to the Subversion Users mailing list.