[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Missing LOCALE in post-commit hook leads to weird behaviour of `svnlook log` with unicode characters – broken transliterations

From: H.-Dirk Schmitt <admin_at_computer42.org>
Date: Sat, 27 Jan 2018 18:33:27 +0100

I found a very weird behaviour of `svnlook log` that IMHO is a bug (or
at least a serious missing documentation issue).

Introduction
------------

Consider a log message like: 'Unicode Test → ø ÄÖÜ'

`svnlook log` invoked in a normal terminal session shows the proper
content.
This works because the environment is set to 'en_US.UTF-8'.

Now start to play - `env LC_ALL=C.UTF-8 svnlook log` also shows a
correct result.

Problem
-------
But falling back to `env LC_ALL=C svnlook log` I got a very flawed
result:

Unicode Test {U+2192} {U+00F8} AOU

→ and ø are replaced with there code description
The German Umlaut chars are translitterated in a very uncommon way.
In the old ASCII/type-writer days Ä was translitterated in Ae (Ö → Oe,
…)

Why is this behaviour not a cosmetic problem.
---------------------------------------------

Consider a post-commit hook fetching the commit message with `svnlook
log`.
Purpose is to postprocess the log message content, e.g. append to
bugzilla issues.

The actual setup is svn+apache2 and a bash script as post commit hook.
The machine locatle as reported by `localectl`: System Locale:
LANG=en_US.utf8

All the commit messages content transfered is broken as described
above.

This happens because the post-commit hook is running with a very
reduced set of environment variables:
   PWD=/
   SHLVL=1

Especially `LC_ALL` is not set which is eqivalent to `LC_ALL=C`.

Suggested Mitigation/Fixing
---------------------------
1. Subversion should ensure that the system locale is forwarded to the
post-commit hook.
2. `svnlook` shoud support the `--encoding` switch
3. German Umlaute (and surely some other national characters in the 8-
bit range) shouldn't translittered in a different
   way as unicode characters (see ø / {U+00F8}).

PS: Google et. al. haven't shown that this issue is well documented.
Received on 2018-01-29 07:29:54 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.