On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
> First of all, thank you for doing this. I worried that mailer.py didn't
> support Python 3 for a long time.
> In message <20201214165710.DEB3517BCBF_at_svn01-us-east.apache.org>, stsp_at_apache.o
> rg writes:
> >Author: stsp
> >Date: Mon Dec 14 16:57:10 2020
> >New Revision: 1884427
> >URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> >Make mailer.py work properly with Python 3, and drop Python 2 support.
> >Most of the changes deal with the handling binary data vs Python strings.
> >I've made sure that mailer.py will work in a UTF-8 environment. In general,
> >UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> >Environments using other encodings may not work as expected, but those will
> >be problematic for hook scripts in general.
> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
> REPOS-PATH argument. Other paths in the script are all Subversion's
> internal paths.
Yes, I agree that the REPOS-PATH argument is an exceptional case.
Unforunately, we have to decode repos_dir because it is used with the
There is no way of knowing which character encoding is used for paths.
At least on UNIX a filename is just a string of bytes and it is independent
of the locale.
Note that hook scripts will run in an empty environment by default.
Which means this script will run in the "C" locale by default.
If another locale is desired, the administrator needs to configure the
server as documented here:
My assumption is that for hook scripts like this one, the server will be
configured with SVNUseUTF8 and the hook script environment will contain
something like LANG=C.UTF-8.
If svnadmin is then also used in the UTF-8 locale to create repositories,
the hook script should work fine.
> >data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> >or decoding of such data, and thus need to work with raw UTF-8 strings, not
> >Python strings.
> I have no objection the code itself, however it is a bit incorrect.
> Our Python3 bindings accept str objects as inputs for char *. It always
> convert them to UTF-8 C strings on all APIs without regarding preferred
> encoding or file system encoding on Python. As some of APIs accept
> local paths and they should be encoded as locale's charset/encoding,
> I also prefer to encode to bytes on caller side explicitly before call
> APIs, to avoid making bugs. Of course, return values of wrapper APIs
> corresponding char ** args and return value of C API are not decoded,
> raw bytes objects.
Thank you, I did not know this. The patch below undoes changes which I
made in order to encode input to SVN APIs as UTF-8.
The mailer.py test is still passing fine with this change.
--- tools/hook-scripts/mailer/mailer.py (revision 1884427)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -624,8 +624,7 @@ class Lock(Messenger):
# The lock comment is the same for all paths, so we can just pull
# the comment for the first path in the dirlist and cache it.
- self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr,
+ self.lock = svn.fs.svn_fs_get_lock(self.repos.fs_ptr, self.dirlist,
@@ -910,7 +909,7 @@ class DiffGenerator:
kind = 'C'
diff = svn.fs.FileDiff(None, None, self.repos.root_this,
- change.path.decode('utf-8'), self.pool)
+ change.path, self.pool)
label1 = '/dev/null\t00:00:00 1970\t' \
'(empty, because file is newly added)'
label2 = '%s\t%s\t(r%s, copy of r%s, %s)' \
--- tools/hook-scripts/mailer/tests/mailer-tweak.py (revision 1884427)
+++ tools/hook-scripts/mailer/tests/mailer-tweak.py (working copy)
@@ -53,7 +53,7 @@ def tweak_dates(pool, home='.'):
date = core.svn_time_to_cstring((DATE_BASE+i*DATE_INCR) * 1000000, pool)
fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_DATE, date, pool)
- fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, b'mailer test', pool)
+ fs.change_rev_prop(fsob, i+1, core.SVN_PROP_REVISION_AUTHOR, 'mailer test', pool)
if len(sys.argv) != 2:
Received on 2020-12-14 21:07:04 CET