[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Missing LOCALE in post-commit hook leads to weird behaviour of `svnlook log` with unicode characters – broken transliterations

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 29 Jan 2018 18:14:39 +0100

On Mon, Jan 29, 2018 at 04:46:09PM +0100, H.-Dirk Schmitt wrote:
> OK - My „Postscriptum“ was not correct - my apologies.
> But still valid are the the points:
> - Broken transliteration of German Umlaut.

I don't see a reason to add support for transliteration if
the locale is incompatible. Just use UTF-8. Paths and log
messages are always stored as UTF-8 inside Subversion anyway.

> - Subversion is ignoring the machine locate settings which should
> normally the default if not overwritten in the Environment. This is a
> considerable bad behaviour for a linux/unix application.

Generally, I agree that unix applications should heed locale settings,
but servers are a special case.

As mentioned in http://subversion.apache.org/docs/release-notes/1.8.html#mod-dav-svn-utf8
the locale behaviour is the result of a policy decision made by the
Apache HTTPD project, namely that all Apache modules run in the "C"
locale and only the "C" locale, even if the system default locale is
something else! Apache HTTPD does not call the setlocale() function.
This is a reasonable trade-off because locale-dependent behaviour could
potentially result in security issues in the webserver. And therefore,
having a webserver module like mod_dav_svn fiddle with the locale and/or
the environment of the running server would be frowned upon.

Hook scripts are generally only interested in the character set
anyway, i.e. LC_CTYPE. All the other locale settings (LC_TIME,
LC_MESSAGES, LC_NUMERIC, etc.) are not critical for hook scripts.

So we added a custom UTF-8 option to mod_dav_svn to allow SVN users to
configure hook script environments in a way that the default HTTPD
behaviour won't allow for, and to set the character set to UTF-8.
Environment variables set this way are only seen by hook scripts and
do not affect the HTTPD server in any way.

I believe this solution gives you the best of both worlds.

Note that using character sets other than ASCII in hook scripts was
impossible for many years. And the move from ASCII to UTF-8 did happen
a couple of years ago already. I don't think changing this behaviour
again would be worthwhile at this point.
See https://issues.apache.org/jira/browse/SVN-2487 in our bug database.
Received on 2018-01-29 18:15:34 CET

This is an archived mail posted to the Subversion Users mailing list.