On 2020/01/07 9:41, Yasuhito FUTATSUKI wrote:
> On 2020/01/07 6:52, Yasuhito FUTATSUKI wrote:
>> By the way, it seems another issue about truncate_subject that current
>> implementation of truncate_subject may break utf-8 multi-bytes character
>> sequence, but I didn't reproduce it(because I always use ascii
>> characters only for file names...).
I could reproduce this problem.
with shell script:
[[[
#!/bin/sh
# LC_CTYPE should be valid utf-8 locale. Please change if below is not
# appropriate
export LC_CTYPE=en_US.UTF-8
# assuming 'svnadmin', 'sed', 'chmod', 'cp', 'mkdir', 'python2', and 'cat'
# in command search path, Python bindings installed correctly,
# and subversion_wc pointing appropriate checkout path
subversion_wc='/path/to/subversion/trunk/working/copy'
# set up new repo for mailer.py testing
svnadmin create newrepo
cp ${subversion_wc}/tools/hook-scripts/mailer/mailer.py newrepo/hooks
sed -e 's/^#truncate_subject = 200/truncate_subject = 78/' \
-e 's/^#mail_command.*/mail_command = cat -/' \
${subversion_wc}/tools/hook-scripts/mailer/mailer.conf.example \
> newrepo/hooks/mailer.conf
sed -e 's/^\(mailer\.py.*\)\/path\/to\/\(mailer\.conf\)/env python2 "\$REPOS"\/hooks\/\1 "\$REPOS"\/hooks\/\2/' \
newrepo/hooks/post-commit.tmpl > newrepo/hooks/post-commit
chmod +x newrepo/hooks/post-commit
svn checkout file:///`pwd`/newrepo wd && cd wd
svn mkdir '〇〇〇一' '〇〇〇二' '〇〇〇三' '〇〇〇四' '〇〇〇五' '〇〇〇六'
svn commit -m 'test for mailer.py'
]]]
the result is...
[[[
Checked out revision 0.
A 〇〇〇一
A 〇〇〇二
A 〇〇〇三
A 〇〇〇四
A 〇〇〇五
A 〇〇〇六
Adding 〇〇〇一
Adding 〇〇〇三
Adding 〇〇〇二
Adding 〇〇〇五
Adding 〇〇〇六
Adding 〇〇〇四
Committing transaction...
Committed revision 1.
Warning: post-commit hook failed (exit code 1) with output:
Traceback (most recent call last):
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 1534, in <module>
sys.argv[3:3+expected_args])
File "/usr/local/lib/python2.7/site-packages/svn/core.py", line 310, in run_app
return func(application_pool, *args, **kw)
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 126, in main
return messenger.generate()
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 489, in generate
self.output.start(group, params)
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 394, in start
self.write(self.mail_headers(group, params))
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 251, in mail_headers
subject = self._rfc2047_encode(self.make_subject(group, params))
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 246, in _rfc2047_encode
return ' '.join(map(_maybe_encode_header, hdr.split()))
File "/home/futatuki/tmp/svn-test/mailer_test/newrepo/hooks/mailer.py", line 244, in _maybe_encode_header
return Header(hdr_token, 'utf-8').encode()
File "/usr/local/lib/python2.7/email/header.py", line 183, in __init__
self.append(s, charset, errors)
File "/usr/local/lib/python2.7/email/header.py", line 267, in append
ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-4: invalid continuation byte
cat: -f: No such file or directory
cat: invalid_at_example.com: No such file or directory
cat: invalid_at_example.com: No such file or directory
]]]
(Lines starts with 'cat:' is expected out put of
"cat - -f invalid_at_example.com invalid_at_example.com")
> Probably it needs something like this (but it doesn't support conbining
> characters, and I didn't any test...):
> [[[
> Index: tools/hook-scripts/mailer/mailer.py
> ===================================================================
> --- tools/hook-scripts/mailer/mailer.py (revision 1872398)
> +++ tools/hook-scripts/mailer/mailer.py (working copy)
> @@ -159,7 +159,13 @@
> truncate_subject = 0
>
> if truncate_subject and len(subject) > truncate_subject:
> - subject = subject[:(truncate_subject - 3)] + "..."
> + # To avoid breaking utf-8 multi-bytes character sequence, we should
> + # search the top of the sequence if the byte of the truncate point is
> + # secound or later part of multi-bytes character sequence.
> + idx = truncate_subject - 3
> + while 0x80 <= ord(subject[idx]) <= 0xbf:
> + idx -= 1
> + subject = subject[:idx] + "..."
> return subject
>
> def start(self, group, params):
> ]]]
After this patch applied, the script above runs without error.
However, this produces Subject line below.
[[[
Subject: r1 - =?utf-8?b?44CH44CH44CH5LiA?= =?utf-8?b?44CH44CH44CH5LiJ?= =?utf-8?b?44CH44CH44CH5LqM?= =?utf-8?b?44CH44CH44CH5LqU?= =?utf-8?b?44CH44CH44CH5YWt?= =?utf-8?b?44CHLi4u?=^M
]]]
and decoded Results is
"Subject: r1 - 〇〇〇一〇〇〇三〇〇〇二〇〇〇五〇〇〇六〇..."
because white space(s) between encoded words are ignored.
I think this is not what we want.
Cheers,
--
Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org> / <futatuki_at_poem.co.jp>
Received on 2020-01-07 16:28:53 CET