[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [swig-py3][patch] interfacing bytes object instead of str

From: Troy Curtis Jr <troycurtisjr_at_apache.org>
Date: Mon, 24 Dec 2018 22:20:40 -0500

On Mon, Dec 24, 2018 at 1:11 PM Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>
wrote:

> On 12/17/18 5:50 AM, Yasuhito FUTATSUKI wrote:
> > On 12/17/18 2:08 AM, Troy Curtis Jr wrote:
>

[snip]

> Status update: I've done to make patch that default to map bytes
> from char *, svn_string_t, and svn_stringbuf_t, for swig-py3_at_1849037.
>

Awesome! I'm really glad you are helping work on this, helps to keep me
motivated as well ;)

So I wonder, as a guiding principle if we could support either Str or Bytes
as an input? That would be akin to most
of the standard module interfaces. I think that would be fairly easy to
support, and would help ensure that the
behavior is much closer to the typical expectation. it would also reduce
the number of b"" that are necessary :P Of
course, in many cases for the standard library interfaces, the input type
is used as a hint to indicate what type to
return, which is much more difficult/impossible to fully support in these
bindings. That seems like it would provide
a way to make using the bindings more familiar, though we'd still have to
have the policy of just always returning Bytes
objects as output. That input/output disconnect might undo any ease of use
advantage than just always using Bytes
...I'm not sure exactly what my opinion on that is...

[ snip ]

Cheers,
> --
> Yasuhito FUTATSUKI
>

The patch inline for comments:

diff --git a/build/ac-macros/swig.m4 b/build/ac-macros/swig.m4
> index d865689de5..c797ae69d8 100644
> --- a/build/ac-macros/swig.m4
> +++ b/build/ac-macros/swig.m4
> @@ -154,7 +154,7 @@ AC_DEFUN(SVN_FIND_SWIG,
> ])
>
> if test "$ac_cv_python_is_py3" = "yes"; then
> - SWIG_PY_OPTS="-python -py3"
> + SWIG_PY_OPTS="-python -py3 -DPY3"
> else
> SWIG_PY_OPTS="-python -classic"
> fi
>

We shouldn't define a new preprocessor value here, but rather make use of
the existing IS_PY3 value defined by py3c.

diff --git a/subversion/bindings/swig/core.i
> b/subversion/bindings/swig/core.i
> index e5a234c5e2..c309d48070 100644
> --- a/subversion/bindings/swig/core.i
> +++ b/subversion/bindings/swig/core.i
>

[snip]

@@ -442,16 +442,18 @@
> */
> #ifdef SWIGPYTHON
> %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> - if (!PyStr_Check($input)) {
> + char *tmpdata;
> + Py_ssize_t length;
> + if (!PyBytes_Check($input)) {
> PyErr_SetString(PyExc_TypeError,
> - "expecting a string for the buffer");
> + "expecting a bytes object for the buffer");
> SWIG_fail;
> }
> - /* Explicitly cast away const from PyStr_AsUTF8AndSize (in Python 3).
> The
> - swig generated argument ($1) here will be "char *", but since its
> only
> - use is to pass directly into the "const char *" argument of the
> wrapped
> - function, this is safe to do. */
> - $1 = (char *)PyStr_AsUTF8AndSize($input, &temp);
>

> + if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
>

Why not just use "&$1" here instead of introducing a new temp variable?

+ SWIG_fail;
> + }
> + temp = ($*2_type)length;
> + $1 = tmpdata;
> $2 = ($2_ltype)&temp;
> }
> #endif
>
@@ -504,8 +506,8 @@
> SWIG_fail;
> }
>
> - if (PyStr_Check($input)) {
> - const char *value = PyStr_AsString($input);
> + if (PyBytes_Check($input)) {
> + const char *value = PyBytes_AsString($input);
> $1 = apr_pstrdup(_global_pool, value);
> }
> else if (PyLong_Check($input)) {
>

Since this function does a whole list of checks and conversions, perhaps it
makes sense to support Str here in
addition to bytes? Of course in Py2 the "PyStr_Check()" would effectively
be performed twice if we did that, but
I don't expect that to be a tremendous impact, and with the "cost" of an
addition "IS_PY3" preprocessor check,
that could be eliminated from the Py2 build.

> diff --git a/subversion/bindings/swig/include/svn_global.swg
> b/subversion/bindings/swig/include/svn_global.swg
> index 191bbdd531..f7364be28f 100644
> --- a/subversion/bindings/swig/include/svn_global.swg
> +++ b/subversion/bindings/swig/include/svn_global.swg
> @@ -31,6 +31,12 @@
> #define SVN_DEPRECATED
> #endif
>
> +#ifdef SWIGPYTHON
> +%begin %{
> +#define SWIG_PYTHON_STRICT_BYTE_CHAR
> +%}
> +#endif
> +
> %include typemaps.i
> %include constraints.i
> %include exception.i
> @@ -136,9 +142,15 @@ static PyObject * _global_py_pool = NULL;
> /* Python format specifiers. Use Python instead of SWIG to parse
> these basic types, because Python reports better error messages
> (with correct argument numbers). */
> +#if defined(PY3)
>

This is where we'd want to use py3c's "IS_PY3" value.

> +%typemap (in, parse="y")
> + char *, char const *, char * const, char const * const "";
> +#else
> %typemap (in, parse="s")
> char *, char const *, char * const, char const * const "";
> +#endif
> %typemap (in, parse="c") char "";
> +
> %typemap (in, fragment=SWIG_As_frag(long)) long
> {
> $1 = ($1_ltype)SWIG_As(long)($input);
>

[snip]

> diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> index 2301d66770..44ec8662b7 100644
> --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> @@ -547,23 +547,23 @@ static char *make_string_from_ob(PyObject *ob,
> apr_pool_t *pool)
>

Probably a reasonable place to support either Bytes or Str.

 {
> if (ob == Py_None)
> return NULL;
> - if (! PyStr_Check(ob))
> + if (! PyBytes_Check(ob))
> {
> - PyErr_SetString(PyExc_TypeError, "not a string");
> + PyErr_SetString(PyExc_TypeError, "not a bytes object");
> return NULL;
> }
> - return apr_pstrdup(pool, PyStr_AsString(ob));
> + return apr_pstrdup(pool, PyBytes_AsString(ob));
> }
> static svn_string_t *make_svn_string_from_ob(PyObject *ob, apr_pool_t
> *pool)
> {
> if (ob == Py_None)
> return NULL;
> - if (! PyStr_Check(ob))
> + if (! PyBytes_Check(ob))
> {
> - PyErr_SetString(PyExc_TypeError, "not a string");
> + PyErr_SetString(PyExc_TypeError, "not a bytes object");
> return NULL;
> }
> - return svn_string_create(PyStr_AsString(ob), pool);
> + return svn_string_create(PyBytes_AsString(ob), pool);
> }
>
>
> @@ -598,7 +598,7 @@ static PyObject *convert_hash(apr_hash_t *hash,
> return NULL;
> }
> /* ### gotta cast this thing cuz Python doesn't use "const" */
> - if (PyDict_SetItemString(dict, (char *)key, value) == -1)
> + if (PyDict_SetItem(dict, PyBytes_FromString((char *)key), value)
> == -1)
> {
> Py_DECREF(value);
> Py_DECREF(dict);
> @@ -623,10 +623,10 @@ static PyObject *convert_svn_string_t(void *value,
> void *ctx,
>
> const svn_string_t *s = value;
>
> - return PyStr_FromStringAndSize(s->data, s->len);
> + return PyBytes_FromStringAndSize(s->data, s->len);
> }
>
> -/* Convert a C string into a Python String object (or a reference to
> +/* Convert a C string into a Python Bytes object (or a reference to
> Py_None if CSTRING is NULL). */
> static PyObject *cstring_to_pystring(const char *cstring)
>

Doesn't returning PyBytes here make this function incorrectly named now? If
this were renamed to
"cstring_to_pybytes" it would be better, since that also matches the
Python2.7 idiom of bytes being alias of str.
And this is a static function, so we are safe to change it from an ABI
standpoint.

> {
> @@ -636,7 +636,7 @@ static PyObject *cstring_to_pystring(const char
> *cstring)
> Py_INCREF(Py_None);
> return retval;
> }
> - return PyStr_FromString(cstring);
> + return PyBytes_FromString(cstring);
> }
>
> static PyObject *convert_svn_client_commit_item3_t(void *value, void *ctx)
>

[snip]

@@ -1321,7 +1321,7 @@ svn_swig_py_unwrap_string(PyObject *source,
> void *baton)
>

Requiring "source" to be a Bytes object, makes this function name be
incorrect, since giving it a string will
cause an error. Probably a reasonable place to support both Bytes/str.
This is an exported function, so changing
the name isn't a good option.

 {
> const char **ptr_dest = destination;
> - *ptr_dest = PyStr_AsString(source);
> + *ptr_dest = PyBytes_AsString(source);
>
> if (*ptr_dest != NULL)
> return 0;
> @@ -1440,7 +1440,7 @@ PyObject *svn_swig_py_array_to_list(const
> apr_array_header_t *array)
> for (i = 0; i < array->nelts; ++i)
> {
> PyObject *ob =
> - PyStr_FromString(APR_ARRAY_IDX(array, i, const char *));
> + PyBytes_FromString(APR_ARRAY_IDX(array, i, const char *));
> if (ob == NULL)
> goto error;
> PyList_SET_ITEM(list, i, ob);
> @@ -1748,7 +1748,8 @@ static svn_error_t *delete_entry(const char *path,
>
> /* ### python doesn't have 'const' on the method name and format */
> if ((result = PyObject_CallMethod(ib->editor, (char *)"delete_entry",
> - (char *)"slOO&", path, revision,
> ib->baton,
> + (char *)SVN_SWIG_BYTES_FMT "lOO&",
> + path, revision, ib->baton,
> make_ob_pool, pool)) == NULL)
> {
> err = callback_exception_error();
> @@ -1779,7 +1780,11 @@ static svn_error_t *add_directory(const char *path,
>
> /* ### python doesn't have 'const' on the method name and format */
> if ((result = PyObject_CallMethod(ib->editor, (char *)"add_directory",
> +#if PY_VERSION_HEX >= 0x03000000
> + (char *)"yOylO&", path, ib->baton,
> +#else
> (char *)"sOslO&", path, ib->baton,
> +#endif
>

We definitly want to use the existing "IS_PY3" preprocesser value here. And
yes I think the
definition SVN_SWIG_BYTES_FMT should be changed to use IS_PY3 instead of
the version check as well
(which is the precedence I assume you are following). The same applies to
all the future usages
of PY_VERSION_HEX below as well.

I also think that using SVN_SWIG_BYTES_FMT here would better communicate
purpose. However, I recognize
that that will really chop up the format string, so the "#if #else" seems
reasonable enough. But perhaps we should *only*
include the format string the definition, just to make it more clear what
piece we are doing the conditional compile on?

                                     copyfrom_path, copyfrom_revision,
> make_ob_pool, dir_pool)) == NULL)
> {
> @@ -1810,8 +1815,8 @@ static svn_error_t *open_directory(const char *path,
>
> /* ### python doesn't have 'const' on the method name and format */
> if ((result = PyObject_CallMethod(ib->editor, (char *)"open_directory",
> - (char *)"sOlO&", path, ib->baton,
> - base_revision,
> + (char *)SVN_SWIG_BYTES_FMT "OlO&",
> + path, ib->baton, base_revision,
> make_ob_pool, dir_pool)) == NULL)
> {
> err = callback_exception_error();
>

[snip]

> @@ -2508,10 +2542,10 @@ apr_file_t *svn_swig_py_make_file(PyObject
> *py_file,
> if (py_file == NULL || py_file == Py_None)
> return NULL;
>
>

Since this function is doing type checking between Bytes/File object
anyway, perhaps it'd
be a good place to also support Str?

- if (PyStr_Check(py_file))
> + if (PyBytes_Check(py_file))
> {
> /* input is a path -- just open an apr_file_t */
> - const char* fname = PyStr_AsString(py_file);
> + const char* fname = PyBytes_AsString(py_file);
> apr_err = apr_file_open(&apr_file, fname,
> APR_CREATE | APR_READ | APR_WRITE,
> APR_OS_DEFAULT, pool);
>

[snip]

> @@ -3053,16 +3098,16 @@ svn_error_t *svn_swig_py_get_commit_log_func(const
> char **log_msg,
> *log_msg = NULL;
> err = SVN_NO_ERROR;
> }
> - else if (PyStr_Check(result))
>

If we were going to special-case places that are clearly expected to be
string, this log_func place would
probably one of them.

+ else if (PyBytes_Check(result))
> {
> - *log_msg = apr_pstrdup(pool, PyStr_AsString(result));
> + *log_msg = apr_pstrdup(pool, PyBytes_AsString(result));
> Py_DECREF(result);
> err = SVN_NO_ERROR;
> }
> else
> {
> Py_DECREF(result);
> - err = callback_bad_return_error("Not a string");
> + err = callback_bad_return_error("Not a bytes object");
> }
>
> finished:
>

[snip]

> diff --git
> a/subversion/bindings/swig/python/tests/trac/versioncontrol/main.py
> b/subversion/bindings/swig/python/tests/trac/versioncontrol/main.py
> index 1b8a87c3f4..0d5d0618de 100644
> --- a/subversion/bindings/swig/python/tests/trac/versioncontrol/main.py
> +++ b/subversion/bindings/swig/python/tests/trac/versioncontrol/main.py
> @@ -154,12 +154,12 @@ class Node(object):
> Represents a directory or file in the repository.
> """
>
> - DIRECTORY = "dir"
> - FILE = "file"
> + DIRECTORY = b"dir"
> + FILE = b"file"
>
> def __init__(self, path, rev, kind):
> assert kind in (Node.DIRECTORY, Node.FILE), "Unknown node kind
> %s" % kind
> - self.path = str(path)
> + self.path = path
>

Dropping use of "str()" changes the Py2 interface since in the current
incarnation you could pass a number, or some object which
defines __str__ and it will get used correctly.So I think here we have to
do an isinstanceof check so that we don't try to run "str"
on a bytes object, but otherwise continuing to use str()

> self.rev = rev
> self.kind = kind
>

[snip]

diff --git
> a/subversion/bindings/swig/python/tests/trac/versioncontrol/svn_fs.py
> b/subversion/bindings/swig/python/tests/trac/versioncontrol/svn_fs.py
> index e0aef2df0d..fe92bb70b7 100644
> --- a/subversion/bindings/swig/python/tests/trac/versioncontrol/svn_fs.py
> +++ b/subversion/bindings/swig/python/tests/trac/versioncontrol/svn_fs.py
> @@ -55,9 +55,12 @@ import os.path
> import time
> import weakref
> import posixpath
> +import sys
>
> from svn import fs, repos, core, delta
>
> +IS_PY3 = sys.version_info[0] >= 3
> +
> _kindmap = {core.svn_node_dir: Node.DIRECTORY,
> core.svn_node_file: Node.FILE}
>
> @@ -290,7 +293,7 @@ class SubversionNode(Node):
> def __init__(self, path, rev, authz, scope, fs_ptr):
> self.authz = authz
> self.scope = scope
> - if scope != '/':
> + if scope != b'/':
> self.scoped_path = scope + path
> else:
> self.scoped_path = path
> @@ -300,7 +303,11 @@ class SubversionNode(Node):
> self.root = fs.revision_root(fs_ptr, rev)
> node_type = fs.check_path(self.root, self.scoped_path)
> if not node_type in _kindmap:
> - raise TracError("No node at %s in revision %s" % (path, rev))
> + if IS_PY3:
> + raise TracError("No node at %s in revision %s"
> + % (path.decode('UTF-8'), rev))
> + else:
> + raise TracError("No node at %s in revision %s" % (path,
> rev))
>

Instead of having an explicit IS_PY3 check here, we should just store a
unicode string in the exception object.

> self.created_rev = fs.node_created_rev(self.root,
> self.scoped_path)
> self.created_path = fs.node_created_path(self.root,
> self.scoped_path)
> # Note: 'created_path' differs from 'path' if the last change was
> a copy,
> @@ -323,7 +330,7 @@ class SubversionNode(Node):
> return
> entries = fs.dir_entries(self.root, self.scoped_path)
> for item in entries:
> - path = '/'.join((self.path, item))
> + path = b'/'.join((self.path, item))
> if not self.authz.has_permission(path):
> continue
> yield SubversionNode(path, self._requested_rev, self.authz,
> @@ -350,7 +357,7 @@ class SubversionNode(Node):
> def get_properties(self):
> props = fs.node_proplist(self.root, self.scoped_path)
> for name,value in core._as_list(props.items()):
> - props[name] = str(value) # Make sure the value is a proper
> string
> + props[name] = value
>

Dropping str() here changes the API of the Py2 interface, just like the
other str() drop mentioned above.

> return props
>
> def get_content_length(self):
>
>
[snip]
Received on 2018-12-25 04:21:08 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.