[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Tue, 15 Dec 2020 19:47:59 +0900

On 2020/12/15 5:06, Stefan Sperling wrote:
> On Tue, Dec 15, 2020 at 03:51:03AM +0900, Yasuhito FUTATSUKI wrote:
>> First of all, thank you for doing this. I worried that mailer.py didn't
>> support Python 3 for a long time.
>>
>> In message <20201214165710.DEB3517BCBF_at_svn01-us-east.apache.org>, stsp_at_apache.o
>> rg writes:
>>> Author: stsp
>>> Date: Mon Dec 14 16:57:10 2020
>>> New Revision: 1884427
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
>>> Log:
>>> Make mailer.py work properly with Python 3, and drop Python 2 support.
>>>
>>> Most of the changes deal with the handling binary data vs Python strings.
>>>
>>> I've made sure that mailer.py will work in a UTF-8 environment. In general,
>>> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
>>> Environments using other encodings may not work as expected, but those will
>>> be problematic for hook scripts in general.
>>
>> Perhaps the problem on locale other than UTF-8 is only on iterpretation of
>> REPOS-PATH argument. Other paths in the script are all Subversion's
>> internal paths.
>
> Yes, I agree that the REPOS-PATH argument is an exceptional case.
> Unforunately, we have to decode repos_dir because it is used with the
> os.path.join() function.

As far as I read the code again, it seems this is already support
locales other than C/C.UTF-8 if mailer.py is kicked with appropriate
LC_CTYPE environment. The REPOS-PATH argment comes from sys.argv[2]
already decoded in file system encoding (i.e. encoding corresponding
to LC_CTYPE) and Python encodes it if access to file system. On the
other hand, svn.repos.open() wrapper function accepts str and encode
it to UTF-8 string, as the svn_repos_open() C API expects.

Then I overlooked sys.argv[x] and cmd_args[x] in main() are
already str.

Perhaps it needs (although the last hunk is not nessesary):
[[[
Index: tools/hook-scripts/mailer/mailer.py
===================================================================
--- tools/hook-scripts/mailer/mailer.py (revision 1884434)
+++ tools/hook-scripts/mailer/mailer.py (working copy)
@@ -88,10 +88,10 @@
     messenger = Commit(pool, cfg, repos)
   elif cmd == 'propchange' or cmd == 'propchange2':
     revision = int(cmd_args[0])
- author = cmd_args[1].decode('utf-8')
- propname = cmd_args[2].decode('utf-8')
+ author = cmd_args[1]
+ propname = cmd_args[2]
     if cmd == 'propchange2' and cmd_args[3]:
- action = cmd_args[3].decode('utf-8')
+ action = cmd_args[3]
     else:
       action = 'A'
     repos = Repository(repos_dir, revision, pool)
@@ -103,7 +103,7 @@
                  })
     messenger = PropChange(pool, cfg, repos, author, propname, action)
   elif cmd == 'lock' or cmd == 'unlock':
- author = cmd_args[0].decode('utf-8')
+ author = cmd_args[0]
     repos = Repository(repos_dir, 0, pool) ### any old revision will do
     # Override the repos revision author with the author of the lock/unlock
     repos.author = author
@@ -1510,7 +1510,7 @@
     usage()
 
   cmd = sys.argv[1]
- repos_dir = svn.core.svn_path_canonicalize(sys.argv[2])
+ repos_dir = svn.core.svn_path_canonicalize(sys.argv[2].encode('utf-8'))
   repos_dir = repos_dir.decode('utf-8')
   try:
     expected_args = cmd_list[cmd]
]]]

(Also, I found svn.repos.open() and svn.core.svn_path_canonicalize() already
accepted str in commited code :))

> There is no way of knowing which character encoding is used for paths.
> At least on UNIX a filename is just a string of bytes and it is independent
> of the locale.
>
> Note that hook scripts will run in an empty environment by default.
> Which means this script will run in the "C" locale by default.
> If another locale is desired, the administrator needs to configure the
> server as documented here:
> https://subversion.apache.org/docs/release-notes/1.8.html#hooks-env
>
> My assumption is that for hook scripts like this one, the server will be
> configured with SVNUseUTF8 and the hook script environment will contain
> something like LANG=C.UTF-8.
>
> If svnadmin is then also used in the UTF-8 locale to create repositories,
> the hook script should work fine.

I assumed that hook scripts kicked by file:// or svn+ssh:// scheme.

I think even if the repos path is neither UTF-8 nor ascii but appropriate
encoding string, hook scripts themselves know it and can invoke mailer.py
script with appropriate LC_CTYPE by using env(1) utility or directly setting
environment variable within hook scripts themselves.

Of course, we can easily move or make symlink of repository path from
non UTF-8 path to UTF-8 or ascii path, so I don't insist that we should
support non UTF-8 path.

>>> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
>>> or decoding of such data, and thus need to work with raw UTF-8 strings, not
>>> Python strings.
>>
>> I have no objection the code itself, however it is a bit incorrect.
>>
>> Our Python3 bindings accept str objects as inputs for char *. ^ as well as bytes objects

                                     If str objects are passed, v
>> It always
>> convert them to UTF-8 C strings on all APIs without regarding preferred
>> encoding or file system encoding on Python. As some of APIs accept
>> local paths and they should be encoded as locale's charset/encoding,
>> I also prefer to encode to bytes on caller side explicitly before call
>> APIs, to avoid making bugs. Of course, return values of wrapper APIs
>> corresponding char ** args and return value of C API are not decoded,
>> raw bytes objects.
>
> Thank you, I did not know this. The patch below undoes changes which I
> made in order to encode input to SVN APIs as UTF-8.
>
> The mailer.py test is still passing fine with this change.

Yes, however as I wrote above I prefer not to revert them.
 
I also found some bytes/str mixture still left around return values
for each call of Repository.get_rev_prop(), however I just found,
but not tested. So I continue to look over the code and I'll try
to fix the problems if they still exist.

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>
Received on 2020-12-15 11:48:55 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.