[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Tue, 15 Dec 2020 03:51:03 +0900

First of all, thank you for doing this. I worried that mailer.py didn't
support Python 3 for a long time.

In message <20201214165710.DEB3517BCBF_at_svn01-us-east.apache.org>, stsp_at_apache.o
rg writes:
>Author: stsp
>Date: Mon Dec 14 16:57:10 2020
>New Revision: 1884427
>URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>Make mailer.py work properly with Python 3, and drop Python 2 support.
>Most of the changes deal with the handling binary data vs Python strings.
>I've made sure that mailer.py will work in a UTF-8 environment. In general,
>UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>Environments using other encodings may not work as expected, but those will
>be problematic for hook scripts in general.

Perhaps the problem on locale other than UTF-8 is only on iterpretation of
REPOS-PATH argument. Other paths in the script are all Subversion's
internal paths.

> SVN repositories store internal
>data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
>or decoding of such data, and thus need to work with raw UTF-8 strings, not
>Python strings.

I have no objection the code itself, however it is a bit incorrect.

Our Python3 bindings accept str objects as inputs for char *. It always
convert them to UTF-8 C strings on all APIs without regarding preferred
encoding or file system encoding on Python. As some of APIs accept
local paths and they should be encoded as locale's charset/encoding,
I also prefer to encode to bytes on caller side explicitly before call
APIs, to avoid making bugs. Of course, return values of wrapper APIs
corresponding char ** args and return value of C API are not decoded,
raw bytes objects.

This commit also fixed an issue that it could truncate Subject on the
way of multi-byte character sequence, and parhaps it also changes the
unit of truncate_subject config parameter, from octets on raw subject
to characters on raw subject. They are not equal if a subject contains
multi-byte characters.

The issue of RFC5321 violation of max line length I pointed out on
Janunary might be fixed by this commit because of fix on
email.header.Header implementation of Python 3 library, but I've not
confirmed yet.


Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Received on 2020-12-14 19:52:10 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.