[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

From: Stefan Sperling <stsp_at_apache.org>
Date: Mon, 28 Dec 2020 17:08:16 +0100

On Mon, Dec 21, 2020 at 07:15:53AM +0000, Daniel Shahaf wrote:
> stsp_at_apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> > URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> > Log:
> > Make mailer.py work properly with Python 3, and drop Python 2 support.
> >
> > Most of the changes deal with the handling binary data vs Python strings.
> >
> > I've made sure that mailer.py will work in a UTF-8 environment. In general,
> > UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> > Environments using other encodings may not work as expected, but those will
> > be problematic for hook scripts in general.
>
> Correct me if I'm wrong, but it sounds like you haven't ruled out the
> possibility that this commit will constitute a regression for anyone
> who runs mailer.py in a non-UTF-8 environment and will upgrade to this
> commit.

Anyone who didn't explicitly set a locale in hook-envs will run this
script in the C locale today, and this case still works.

I would not count broken email generated in non-UTF-8 locales as a regression.
mailer.py's email content encoding header was set to UTF-8 17 years ago:
https://svn.apache.org/r847862

> I suppose it's fair to classify non-UTF-8 environments as "patches
> welcome", following the precedent of libmagic support in the Windows
> build, but:
>
> 1. Can we detect non-UTF-8 environments and warn or error out hard upon
> them? «locale.getlocale()[1]» seems promising?

I don't see what value this adds. When there is an encoding problem it
will already be obvious from gargabe characters in email notifications.
And many people can live with that, so there is no reason to error out.

Given that file diffs can contain an arbitrary mix of encodings, this script
cannot guarantee readable diff output unless ASCII/UTF-8 encoding is used
for all files. I don't see a good way around this limitation. Auto-detecting
file encoding is tricky, and requring people to tag file encodings via the
svn:mime-type property is simply not going to work in pratice.

> 2. A change that hasn't been confirmed *not* to constitute a regression
> merits a release notes entry. Would you do the honours?

I think release notes should mention Python3 support for mailer.py and
recommend that mailer.py is run in the UTF-8 or C locale.
Received on 2020-12-28 17:08:23 CET

This is an archived mail posted to the Subversion Dev mailing list.