mailer.py cannot handle utf-8 path in Subject correctly (Re: mailer.py can produce subject header violates RFC 5321/5322 if truncate_subject is not set)

From: Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>
Date: Wed, 8 Jan 2020 00:26:39 +0900

On 2020/01/07 9:41, Yasuhito FUTATSUKI wrote:
> On 2020/01/07 6:52, Yasuhito FUTATSUKI wrote:
>> By the way, it seems another issue about truncate_subject that current
>> implementation of truncate_subject may break utf-8 multi-bytes character
>> sequence, but I didn't reproduce it(because I always use ascii
>> characters only for file names...).

I could reproduce this problem.

with shell script:
[[[
#!/bin/sh

# LC_CTYPE should be valid utf-8 locale. Please change if below is not
# appropriate
export LC_CTYPE=en_US.UTF-8

# assuming 'svnadmin', 'sed', 'chmod', 'cp', 'mkdir', 'python2', and 'cat'
# in command search path, Python bindings installed correctly,
# and subversion_wc pointing appropriate checkout path
subversion_wc='/path/to/subversion/trunk/working/copy'

# set up new repo for mailer.py testing
svnadmin create newrepo
cp ${subversion_wc}/tools/hook-scripts/mailer/mailer.py newrepo/hooks
sed -e 's/^#truncate_subject = 200/truncate_subject = 78/' \
  -e 's/^#mail_command.*/mail_command = cat -/' \
  ${subversion_wc}/tools/hook-scripts/mailer/mailer.conf.example \
> newrepo/hooks/mailer.conf
sed -e 's/^$mailer\.py.*$\/path\/to\/$mailer\.conf$/env python2 "\$REPOS"\/hooks\/\1 "\$REPOS"\/hooks\/\2/' \
  newrepo/hooks/post-commit.tmpl > newrepo/hooks/post-commit
chmod +x newrepo/hooks/post-commit

svn checkout file:///`pwd`/newrepo wd && cd wd
svn mkdir 'ã€‡ã€‡ã€‡ä¸€' 'ã€‡ã€‡ã€‡äºŒ' 'ã€‡ã€‡ã€‡ä¸‰' 'ã€‡ã€‡ã€‡å››' 'ã€‡ã€‡ã€‡äº”' 'ã€‡ã€‡ã€‡å…'
svn commit -m 'test for mailer.py'
]]]

the result is...
[[[
Checked out revision 0.
A ã€‡ã€‡ã€‡ä¸€
A ã€‡ã€‡ã€‡äºŒ
A ã€‡ã€‡ã€‡ä¸‰
A ã€‡ã€‡ã€‡å››
A ã€‡ã€‡ã€‡äº”
A ã€‡ã€‡ã€‡å…
Adding ã€‡ã€‡ã€‡ä¸€
Adding ã€‡ã€‡ã€‡ä¸‰
Adding ã€‡ã€‡ã€‡äºŒ
Adding ã€‡ã€‡ã€‡äº”
Adding ã€‡ã€‡ã€‡å…
Adding ã€‡ã€‡ã€‡å››
Committing transaction...
Committed revision 1.

Warning: post-commit hook failed (exit code 1) with output:
Traceback (most recent call last):
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 1534, in <module>
    sys.argv[3:3+expected_args])
  File "/usr/local/lib/python2.7/site-packages/svn/core.py", line 310, in run_app
    return func(application_pool, *args, **kw)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 126, in main
    return messenger.generate()
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 489, in generate
    self.output.start(group, params)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 394, in start
    self.write(self.mail_headers(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 251, in mail_headers
    subject = self._rfc2047_encode(self.make_subject(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 246, in _rfc2047_encode
    return ' '.join(map(_maybe_encode_header, hdr.split()))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 244, in _maybe_encode_header
    return Header(hdr_token, 'utf-8').encode()
  File "/usr/local/lib/python2.7/email/header.py", line 183, in __init__
    self.append(s, charset, errors)
  File "/usr/local/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-4: invalid continuation byte
cat: -f: No such file or directory
cat: invalid_at_example.com: No such file or directory
cat: invalid_at_example.com: No such file or directory

]]]
(Lines starts with 'cat:' is expected out put of
"cat - -f invalid_at_example.com invalid_at_example.com")

> Probably it needs something like this (but it doesn't support conbining
> characters, and I didn't any test...):
> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1872398)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -159,7 +159,13 @@
> truncate_subject = 0
>
> if truncate_subject and len(subject) > truncate_subject:
> - subject = subject[:(truncate_subject - 3)] + "..."
> + # To avoid breaking utf-8 multi-bytes character sequence, we should
> + # search the top of the sequence if the byte of the truncate point is
> + # secound or later part of multi-bytes character sequence.
> + idx = truncate_subject - 3
> + while 0x80 <= ord(subject[idx]) <= 0xbf:
> + idx -= 1
> + subject = subject[:idx] + "..."
> return subject
>
> def start(self, group, params):
> ]]]

After this patch applied, the script above runs without error.

However, this produces Subject line below.

[[[
Subject: r1 - =?utf-8?b?44CH44CH44CH5LiA?= =?utf-8?b?44CH44CH44CH5LiJ?= =?utf-8?b?44CH44CH44CH5LqM?= =?utf-8?b?44CH44CH44CH5LqU?= =?utf-8?b?44CH44CH44CH5YWt?= =?utf-8?b?44CHLi4u?=^M
]]]

and decoded Results is

"Subject: r1 - ã€‡ã€‡ã€‡ä¸€ã€‡ã€‡ã€‡ä¸‰ã€‡ã€‡ã€‡äºŒã€‡ã€‡ã€‡äº”ã€‡ã€‡ã€‡å…ã€‡..."

because white space(s) between encoded words are ignored.
I think this is not what we want.

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org> / <futatuki_at_poem.co.jp>

Received on 2020-01-07 16:28:53 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]