[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[RFC] str versus bytes in subversion/tests/cmdline

From: Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA_at_GMail.Com>
Date: Tue, 31 Mar 2009 14:20:13 +0200

Python 3 contains major changes in handling of strings.
http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

str type was renamed to bytes type. ("string" -> b"string")
unicode type was renamed to str type. (u"string" -> "string")

I will use Python 3 names of these types in present e-mail.

In Python 2:
>>> "abc" == u"abc"
True
>>>

In Python 3:
>>> b"abc" == "abc"
False
>>>

Explicit encoding / decoding between these types is now required.

bytes.decode() returns str.
str.encode() returns bytes.

(bytes type doesn't support encode(). str type doesn't support decode().)

subversion/tests/cmdline tests use subprocess.Popen to obtain
the output of all commands and to send the input to them.
subprocess.Popen.{stdin,stdout,stderr}() support only bytes type.

Encoding / decoding doesn't work with invalid UTF-8 characters.

merge_tests.py 4 ("some simple property merges") test sets some
properties with invalid UTF-8 characters and later checks the output of svn.

This problem has 2 solutions:

1. Internally store the output of commands in bytes type, perform some
encodings/decodings and convert huge number of strings to bytes type
(i.e. "string" -> b"string" in source code).

Invalid UTF-8 characters would be still supported by
subversion/tests/cmdline/svntest.

See the attached, unfinished patch (subversion-svntest-python-3.patch) for
the "python-3-compatibility" branch which makes basic_tests.py 1 ("basic
checkout of a wc") test pass with both Python 2.6 and Python 3.0!

2. Internally store the output of commands in str type, decode output
of commands quickly after obtaining it from subprocess.Popen, convert
significantly smaller number of strings to bytes type and *ban invalid UTF-8
characters* in subversion/tests/cmdline.

In this case merge_tests.py 4 test would have to be changed to no longer
set invalid UTF-8 characters in some properties.

See the attached patch (subversion-svntest-decode_subprocess_output.patch)
for trunk which implements decoding of outpuf of commands.

--
Arfrever Frehtes Taifersar Arahesis
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1495347


Received on 2009-03-31 14:20:34 CEST

This is an archived mail posted to the Subversion Dev mailing list.