[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] str versus bytes in subversion/tests/cmdline

From: Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA_at_GMail.Com>
Date: Tue, 31 Mar 2009 15:20:17 +0200

2009-03-31 14:20 Arfrever Frehtes Taifersar Arahesis
<arfrever.fta_at_gmail.com> napisaƂ(a):
> Python 3 contains major changes in handling of strings.
> http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
>
> str type was renamed to bytes type. ("string" -> b"string")
> unicode type was renamed to str type. (u"string" -> "string")
>
> I will use Python 3 names of these types in present e-mail.
>
> In Python 2:
>>>> "abc" == u"abc"
> True
>>>>
>
> In Python 3:
>>>> b"abc" == "abc"
> False
>>>>
>
> Explicit encoding / decoding between these types is now required.
>
> bytes.decode() returns str.
> str.encode() returns bytes.
>
> (bytes type doesn't support encode(). str type doesn't support decode().)
>
> subversion/tests/cmdline tests use subprocess.Popen to obtain
> the output of all commands and to send the input to them.
> subprocess.Popen.{stdin,stdout,stderr}() support only bytes type.
>
> Encoding / decoding doesn't work with invalid UTF-8 characters.
>
> merge_tests.py 4 ("some simple property merges") test sets some
> properties with invalid UTF-8 characters and later checks the output of svn.
>
> This problem has 2 solutions:
>
> 1. Internally store the output of commands in bytes type, perform some
> encodings/decodings and convert huge number of strings to bytes type
> (i.e. "string" -> b"string" in source code).
>
> Invalid UTF-8 characters would be still supported by
> subversion/tests/cmdline/svntest.
>
> See the attached, unfinished patch (subversion-svntest-python-3.patch) for
> the "python-3-compatibility" branch which makes basic_tests.py 1 ("basic
> checkout of a wc") test pass with both Python 2.6 and Python 3.0!

I forgot to say that, with this patch applied, paths are stored in str type.
The majority of other variables (e.g. values of properties) are stored
in bytes type.

(I noticed that this patch contains unrelated changes to
tools/po/l10n-report.py.)

I would like to mention that Python 2 is very tolerant in case of
encoding / decoding
(e.g. it allows to call str.encode() or unicode.decode()), so the
changes related to
encoding / decoding will be merged to trunk.

PS. I would like to thank Arfrever for implementing __eq__() and __ne__() so
that comparison of instances of ExpectedOutput works at all.

--
Arfrever Frehtes Taifersar Arahesis
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1495758
Received on 2009-03-31 15:20:36 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.