[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

mailer.py py2/py3 change: non-UTF-8 environments (was: svn commit: r1884427 - in /subversion/trunk/tools/hook-scripts/mailer: mailer.py tests/mailer-t1.output tests/mailer-tweak.py)

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Mon, 21 Dec 2020 07:15:53 +0000

stsp_at_apache.org wrote on Mon, 14 Dec 2020 16:57 -0000:
> URL: http://svn.apache.org/viewvc?rev=1884427&view=rev
> Log:
> Make mailer.py work properly with Python 3, and drop Python 2 support.
> Most of the changes deal with the handling binary data vs Python strings.
> I've made sure that mailer.py will work in a UTF-8 environment. In general,
> UTF-8 is recommended for hook scripts. See the SVNUseUTF8 mod_dav_svn option.
> Environments using other encodings may not work as expected, but those will
> be problematic for hook scripts in general.

Correct me if I'm wrong, but it sounds like you haven't ruled out the
possibility that this commit will constitute a regression for anyone
who runs mailer.py in a non-UTF-8 environment and will upgrade to this

I suppose it's fair to classify non-UTF-8 environments as "patches
welcome", following the precedent of libmagic support in the Windows
build, but:

1. Can we detect non-UTF-8 environments and warn or error out hard upon
them? «locale.getlocale()[1]» seems promising?

2. A change that hasn't been confirmed *not* to constitute a regression
merits a release notes entry. Would you do the honours?



> SVN repositories store internal
> data such as paths in UTF-8. Our Python3 bindings do not deal with encoding
> or decoding of such data, and thus need to work with raw UTF-8 strings, not
> Python strings.
> The encoding of file and property contents is not guaranteed to be UTF-8.
> This was already a problem before this change. This hook script sends email
> with a content type header specifying the UTF-8 encoding. Diffs which contain
> non-UTF-8 text will most likely not render properly when viewed in an email
> reader. At least this problem is now obvious in mailer.py's implementation,
> since all unidiff text is now written out directly as binary data.
> As an additional fix, iterate file groups in sorted order. This results in
> stable output and makes test cases in our tests/ subdirectory reproducible.
> Tested with Python 3.7.5 which is the version I use in my SVN development
> setup at present. Tests with newer versions are welcome.
> * tools/hook-scripts/mailer/mailer.py:
> Drop Python2-specific includes. Adjust includes as per 2to3.
> (main): Decode arguments from UTF-8 to string.
> (OutputBase:write): Encode string to UTF-8 and pass to write_binary().
> OutputBase implementations now need to provide a self.write_binary
> member which implements a write() method for binary data.
> (MailedOutput): email.Header package is gone, use email.header instead,
> and likewise replace use of email.Utils with email.utils
> (SMTPOutput): Provide self.write_binary in terms of a BytesIO() object.
> We cannot use StringIO since diffs may contain data in arbitrary encodings.
> (StandardOutput): Provide self.write_binary in terms of stdout.buffer.
> (PipeOutput): Provide self.write_binary in terms of pipe.stdin.
> (Commit): Decode log message and paths from UTF-8 to string, and iterate
> path groups from mailer.conf in sorted order.
> (Lock): Decode directory entries from UTF-8 to string. Encode paths back
> to UTF-8 when we ask libsvn_fs for a lock on a path.
> Iterate path groups from mailer.conf in sorted order.
> (DiffGenerator): Decode repository paths from UTF-8 to string.
> (TextCommitRenderer): Decode author, log message, and path from UTF-8 to
> string. Write diff data via write_binary, bypassing the re-encoding step.
> (Config): Decode paths from UTF-8 to string before matching them against
> regular expressions. Also decode the repository directory path from UTF-8.
> * tools/hook-scripts/mailer/tests/mailer-t1.output: Adjust expected output.
> File groups are now provided in stable sorted order. This should fix
> spurious test failures in the future.
> * tools/hook-scripts/mailer/tests/mailer-tweak.py: Drop L suffix from long
> integers and pass binary data instead of strings into libsvn_fs.
Received on 2020-12-21 08:16:01 CET

This is an archived mail posted to the Subversion Dev mailing list.