[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

mailer.py cannot handle utf-8 path in Subject correctly (Re: mailer.py can produce subject header violates RFC 5321/5322 if truncate_subject is not set)

From: Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>
Date: Wed, 8 Jan 2020 00:26:39 +0900

On 2020/01/07 9:41, Yasuhito FUTATSUKI wrote:
> On 2020/01/07 6:52, Yasuhito FUTATSUKI wrote:
>> By the way, it seems another issue about truncate_subject that current
>> implementation of truncate_subject may break utf-8 multi-bytes character
>> sequence, but I didn't reproduce it(because I always use ascii
>> characters only for file names...).

I could reproduce this problem.

with shell script:
[[[
#!/bin/sh

# LC_CTYPE should be valid utf-8 locale. Please change if below is not
# appropriate
export LC_CTYPE=en_US.UTF-8

# assuming 'svnadmin', 'sed', 'chmod', 'cp', 'mkdir', 'python2', and 'cat'
# in command search path, Python bindings installed correctly,
# and subversion_wc pointing appropriate checkout path
subversion_wc='/path/to/subversion/trunk/working/copy'

# set up new repo for mailer.py testing
svnadmin create newrepo
cp ${subversion_wc}/tools/hook-scripts/mailer/mailer.py newrepo/hooks
sed -e 's/^#truncate_subject = 200/truncate_subject = 78/' \
  -e 's/^#mail_command.*/mail_command = cat -/' \
  ${subversion_wc}/tools/hook-scripts/mailer/mailer.conf.example \
> newrepo/hooks/mailer.conf
sed -e 's/^\(mailer\.py.*\)\/path\/to\/\(mailer\.conf\)/env python2 "\$REPOS"\/hooks\/\1 "\$REPOS"\/hooks\/\2/' \
  newrepo/hooks/post-commit.tmpl > newrepo/hooks/post-commit
chmod +x newrepo/hooks/post-commit

svn checkout file:///`pwd`/newrepo wd && cd wd
svn mkdir '〇〇〇一' '〇〇〇二' '〇〇〇三' '〇〇〇四' '〇〇〇五' '〇〇〇六'
svn commit -m 'test for mailer.py'
]]]

the result is...
[[[
Checked out revision 0.
A 〇〇〇一
A 〇〇〇二
A 〇〇〇三
A 〇〇〇四
A 〇〇〇五
A 〇〇〇六
Adding 〇〇〇一
Adding 〇〇〇三
Adding 〇〇〇二
Adding 〇〇〇五
Adding 〇〇〇六
Adding 〇〇〇四
Committing transaction...
Committed revision 1.

Warning: post-commit hook failed (exit code 1) with output:
Traceback (most recent call last):
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 1534, in <module>
    sys.argv[3:3+expected_args])
  File "/usr/local/lib/python2.7/site-packages/svn/core.py", line 310, in run_app
    return func(application_pool, *args, **kw)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 126, in main
    return messenger.generate()
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 489, in generate
    self.output.start(group, params)
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 394, in start
    self.write(self.mail_headers(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 251, in mail_headers
    subject = self._rfc2047_encode(self.make_subject(group, params))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 246, in _rfc2047_encode
    return ' '.join(map(_maybe_encode_header, hdr.split()))
  File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 244, in _maybe_encode_header
    return Header(hdr_token, 'utf-8').encode()
  File "/usr/local/lib/python2.7/email/header.py", line 183, in __init__
    self.append(s, charset, errors)
  File "/usr/local/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-4: invalid continuation byte
cat: -f: No such file or directory
cat: invalid_at_example.com: No such file or directory
cat: invalid_at_example.com: No such file or directory

]]]
(Lines starts with 'cat:' is expected out put of
 "cat - -f invalid_at_example.com invalid_at_example.com")
 
> Probably it needs something like this (but it doesn't support conbining
> characters, and I didn't any test...):
> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1872398)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -159,7 +159,13 @@
> truncate_subject = 0
>
> if truncate_subject and len(subject) > truncate_subject:
> - subject = subject[:(truncate_subject - 3)] + "..."
> + # To avoid breaking utf-8 multi-bytes character sequence, we should
> + # search the top of the sequence if the byte of the truncate point is
> + # secound or later part of multi-bytes character sequence.
> + idx = truncate_subject - 3
> + while 0x80 <= ord(subject[idx]) <= 0xbf:
> + idx -= 1
> + subject = subject[:idx] + "..."
> return subject
>
> def start(self, group, params):
> ]]]

After this patch applied, the script above runs without error.

However, this produces Subject line below.

[[[
Subject: r1 - =?utf-8?b?44CH44CH44CH5LiA?= =?utf-8?b?44CH44CH44CH5LiJ?= =?utf-8?b?44CH44CH44CH5LqM?= =?utf-8?b?44CH44CH44CH5LqU?= =?utf-8?b?44CH44CH44CH5YWt?= =?utf-8?b?44CHLi4u?=^M
]]]

and decoded Results is

"Subject: r1 - 〇〇〇一〇〇〇三〇〇〇二〇〇〇五〇〇〇六〇..."

because white space(s) between encoded words are ignored.
I think this is not what we want.

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org> / <futatuki_at_poem.co.jp>
Received on 2020-01-07 16:28:53 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.