On 2020/12/15 5:06, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
>> First of all, thank you for doing this. I worried that mailer.py didn't
>> support Python 3 for a long time.
>> In message <20201214165710.DEB3517BCBF_at_svn01-us-east.apache.org>, stsp_at_apache.o
>> rg writes:
>>> Author: stsp
>>> Date: Mon Dec 14 16:57:10 2020
>>> New Revision: 1884427
>>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>> Most of the changes deal with the handling binary data vs Python strings.
>>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>>> Environments using other encodings may not work as expected, but those will
>>> be problematic for hook scripts in general.
>> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
>> REPOS-PATH argument. Other paths in the script are all Subversion's
>> internal paths.
> Yes, I agree that the REPOS-PATH argument is an exceptional case.
> Unforunately, we have to decode repos_dir because it is used with the
> os.path.join() function.
As far as I read the code again, it seems this is already support
locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv
already decoded in file system encoding (i.e. encoding corresponding
to LC_CTYPE) and Python encodes it if access to file system. On the
other hand, svn.repos.open() wrapper function accepts str and encode
it to UTF-8 string, as the svn_repos_open() C API expects.
Then I overlooked sys.argv[x] and cmd_args[x] in main() are
Perhaps it needs (although the last hunk is not nessesary):
--- tools/hook-scripts/mailer/mailer.py (revision 1884434)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -88,10 +88,10 @@
messenger = Commit(pool, cfg, repos)
elif cmd == 'propchange' or cmd == 'propchange2':
revision = int(cmd_args)
- author = cmd_args.decode('utf-8')
- propname = cmd_args.decode('utf-8')
+ author = cmd_args
+ propname = cmd_args
if cmd == 'propchange2' and cmd_args:
- action = cmd_args.decode('utf-8')
+ action = cmd_args
action = 'A'
repos = Repository(repos_dir, revision, pool)
@@ -103,7 +103,7 @@
messenger = PropChange(pool, cfg, repos, author, propname, action)
elif cmd == 'lock' or cmd == 'unlock':
- author = cmd_args.decode('utf-8')
+ author = cmd_args
repos = Repository(repos_dir, 0, pool) ### any old revision will do
# Override the repos revision author with the author of the lock/unlock
repos.author = author
@@ -1510,7 +1510,7 @@
cmd = sys.argv
- repos_dir = svn.core.svn_path_canonicalize(sys.argv)
+ repos_dir = svn.core.svn_path_canonicalize(sys.argv.encode('utf-8'))
repos_dir = repos_dir.decode('utf-8')
expected_args = cmd_list[cmd]
(Also, I found svn.repos.open() and svn.core.svn_path_canonicalize() already
accepted str in commited code :))
> There is no way of knowing which character encoding is used for paths.
> At least on UNIX a filename is just a string of bytes and it is independent
> of the locale.
> Note that hook scripts will run in an empty environment by default.
> Which means this script will run in the "C" locale by default.
> If another locale is desired, the administrator needs to configure the
> server as documented here:
> My assumption is that for hook scripts like this one, the server will be
> configured with SVNUseUTF8 and the hook script environment will contain
> something like LANG=C.UTF-8.
> If svnadmin is then also used in the UTF-8 locale to create repositories,
> the hook script should work fine.
I assumed that hook scripts kicked by file:// or svn+ssh:// scheme.
I think even if the repos path is neither UTF-8 nor ascii but appropriate
encoding string, hook scripts themselves know it and can invoke mailer.py
script with appropriate LC_CTYPE by using env(1) utility or directly setting
environment variable within hook scripts themselves.
Of course, we can easily move or make symlink of repository path from
non UTF-8 path to UTF-8 or ascii path, so I don't insist that we should
support non UTF-8 path.
>>> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
>>> or decoding of such data, and thus need to work with raw UTF-8 strings, not
>>> Python strings.
>> I have no objection the code itself, however it is a bit incorrect.
>> Our Python3 bindings accept str objects as inputs for char *. ^ as well as bytes objects
If str objects are passed, v
>> It always
>> convert them to UTF-8 C strings on all APIs without regarding preferred
>> encoding or file system encoding on Python. As some of APIs accept
>> local paths and they should be encoded as locale's charset/encoding,
>> I also prefer to encode to bytes on caller side explicitly before call
>> APIs, to avoid making bugs. Of course, return values of wrapper APIs
>> corresponding char ** args and return value of C API are not decoded,
>> raw bytes objects.
> Thank you, I did not know this. The patch below undoes changes which I
> made in order to encode input to SVN APIs as UTF-8.
> The mailer.py test is still passing fine with this change.
Yes, however as I wrote above I prefer not to revert them.
I also found some bytes/str mixture still left around return values
for each call of Repository.get_rev_prop(), however I just found,
but not tested. So I continue to look over the code and I'll try
to fix the problems if they still exist.
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>
Received on 2020-12-15 11:48:55 CET