[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Python3 support for hot-backup.py

From: Yasuhito FUTATSUKI <futatuki_at_bsdclub.org>
Date: Tue, 16 Jun 2020 02:09:35 +0900

On 2020/06/16 0:18, C. Michael Pilato wrote:
> On Mon, Jun 15, 2020 at 10:32 AM Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
> wrote:
>
>> On 2020/06/15 21:38, C. Michael Pilato wrote:
>> Is it needed something like this?
>>
>> [[[
>> Index: tools/backup/hot-backup.py.in
>> ===================================================================
>> --- tools/backup/hot-backup.py.in (revision 1878855)
>> +++ tools/backup/hot-backup.py.in (working copy)
>> @@ -218,7 +218,7 @@
>>
>> if stderr_lines:
>> raise Exception("Unable to find the youngest revision for repository
>> '%s'"
>> - ": %s" % (repo_dir, stderr_lines[0].rstrip()))
>> + ": %s" % (repo_dir,
>> stderr_lines[0].rstrip().decode()))
>>
>> return stdout_lines[0].strip().decode();
>>
>> ]]]
>>
>> If svnlook runs on locale other than C, this can cause UnicodeDecodeError,
>> but without applying it, the output from svnlook is embeded as a
>> representaion
>> of bytes object to the exception message, like b'...'.
>> (Although I think this script assuming C locale implicitly.)
>>
>> So I'm not sure which is better applying this or not.
>>
>> 'return stdout_lines[0].strip().decode();' is okey (except an extra ';',
>> but it is not critical), because stdout_lines[0] is always ascii string
>> in this context.
>>
>
> I removed a couple of stray semicolons in r1878859 -- thanks for catching
> that.
>
> As for your question: if I force "svnlook" to create errors (by setting
> the svnlook variable to "/usr/bin/svn"), today I see an error message with
> the b'...' formatting. Adding .decode() as you suggested makes the b'...'
> go away and I see what I'd expect to see. As far as I can tell, I'm using
> the "en-US.UTF-8" locale, though -- not the "C" one. But maybe I'm just
> getting lucky because the locale encoding is UTF-8 and not, say, Shift-JIS
> or something? I dunno.

I confirmed that the code with .decode() work well in ja_JP.UTF-8 locale
on Python 3.6 and Python 3.7 (but I got an error like "'ascii' codec can't
decode byte 0xe3 in position 18: ordinal not in range(128)").

With non UTF-8, non ascii locale, .decode() without specifying encoding
causes UnicodeDecodeError on 'utf-8' codecs on Python 3.6 and 3.7.

Without .decode(), Python 3.6:
[[[
$ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
Unable to find the youngest revision for repository '/home/futatuki/tmp/t': b"svnlook: E000002: \xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb '/home/futatuki/tmp/t/format' \xa4\xf2\xb3\xab\xa4\xb1\xa4\xde\xa4\xbb\xa4\xf3: \xa4\xbd\xa4\xce\xa4\xe8\xa4\xa6\xa4\xca\xa5\xd5\xa5\xa1\xa5\xa4\xa5\xeb\xa4\xde\xa4\xbf\xa4\xcf\xa5\xc7\xa5\xa3\xa5\xec\xa5\xaf\xa5\xc8\xa5\xea\xa4\xcf\xa4\xa2\xa4\xea\xa4\xde\xa4\xbb\xa4\xf3"
]]]

With .decode(), Python 3.6:
[[[
$ env LC_MESSAGES=ja_JP.eucJP LC_CTYPE=ja_JP.eucJP /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
'utf-8' codec can't decode byte 0xa5 in position 18: invalid start byte
]]]

cf. With .decode(), UTF-8 locale (Japanese):
(This is also what we want to see on ja_JP.eucJP terminal with ja_JP.eucJP locale.)
[[[
$ env LC_MESSAGES=ja_JP.UTF-8 LC_CTYPE=ja_JP.UTF-8 /usr/local/bin/python3.6m tools/backup/hot-backup.py /home/futatuki/tmp/t /home/futatuki/tmp/svn-test/tt
Beginning hot backup of '/home/futatuki/tmp/t'.
Unable to find the youngest revision for repository '/home/futatuki/tmp/t': svnlook: E000002: $B%U%!%$%k(B '/home/futatuki/tmp/t/format' $B$r3+$1$^$;$s(B: $B$=$N$h$&$J%U%!%$%k$^$?$O%G%#%l%/%H%j$O$"$j$^$;$s(B
]]]

So if we want to allow this script run on non UTF-8, non ascii locale
with Python 3, it needs additional code to set encoding.

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Received on 2020-06-15 19:16:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.