[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] str versus bytes in subversion/tests/cmdline

From: Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA_at_GMail.Com>
Date: Wed, 1 Apr 2009 14:04:55 +0100

2009-03-31 14:20 Arfrever Frehtes Taifersar Arahesis
<arfrever.fta_at_gmail.com> napisaƂ(a):
> Python 3 contains major changes in handling of strings.
> http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
>
> str type was renamed to bytes type. ("string" -> b"string")
> unicode type was renamed to str type. (u"string" -> "string")
>
> I will use Python 3 names of these types in present e-mail.
>
> In Python 2:
>>>> "abc" == u"abc"
> True
>>>>
>
> In Python 3:
>>>> b"abc" == "abc"
> False
>>>>
>
> Explicit encoding / decoding between these types is now required.
>
> bytes.decode() returns str.
> str.encode() returns bytes.
>
> (bytes type doesn't support encode(). str type doesn't support decode().)
>
> subversion/tests/cmdline tests use subprocess.Popen to obtain
> the output of all commands and to send the input to them.
> subprocess.Popen.{stdin,stdout,stderr}() support only bytes type.
>
> Encoding / decoding doesn't work with invalid UTF-8 characters.
>
> merge_tests.py 4 ("some simple property merges") test sets some
> properties with invalid UTF-8 characters and later checks the output of svn.
>
> This problem has 2 solutions:
>
> 1. Internally store the output of commands in bytes type, perform some
> encodings/decodings and convert huge number of strings to bytes type
> (i.e. "string" -> b"string" in source code).
>
> Invalid UTF-8 characters would be still supported by
> subversion/tests/cmdline/svntest.
>
> See the attached, unfinished patch (subversion-svntest-python-3.patch) for
> the "python-3-compatibility" branch which makes basic_tests.py 1 ("basic
> checkout of a wc") test pass with both Python 2.6 and Python 3.0!
>
> 2. Internally store the output of commands in str type, decode output
> of commands quickly after obtaining it from subprocess.Popen, convert
> significantly smaller number of strings to bytes type and *ban invalid UTF-8
> characters* in subversion/tests/cmdline.
>
> In this case merge_tests.py 4 test would have to be changed to no longer
> set invalid UTF-8 characters in some properties.
>
> See the attached patch (subversion-svntest-decode_subprocess_output.patch)
> for trunk which implements decoding of outpuf of commands.

I have decided to implement the improved version of the second
solution. svntest will try to store output of commands in str type,
but will use bytes type for strings with invalid UTF-8 characters.
bytes type will have to be used also when writing to files opened in
binary mode.

subversion/tests/cmdline/svntest/wc.py:StateItem.tweak() will contain
workaround for merge_tests.py 4. The properties set by merge_tests.py
4 (simple_property_merges()) will have bytes type. The expected output
of error message with property values with invalid UTF-8 characters
will depend on Python version.

See the attached, unfinished patch (subversion-svntest-python-3-v2.patch).

Summary of test results with Python 3:
  994 tests PASSED
  24 tests SKIPPED
  28 tests XFAILED (1 WORK-IN-PROGRESS)
  43 tests FAILED

23 tests fail due to os.tempnam().

--
Arfrever Frehtes Taifersar Arahesis
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1506819

Received on 2009-04-01 15:05:16 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.