[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]On branch swig-py3: accept both of bytes/str for input char * args

From: Troy Curtis Jr <troycurtisjr_at_gmail.com>
Date: Sun, 13 Jan 2019 23:10:03 -0500

On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
<futatuki_at_yf.bsdclub.org> wrote:
>
> Hi,
>
> I've made a patch swig-py3_at_1850520, to accept both of bytes and str
> objects for input args correspondings char * args of Subversion API
> on Python 3 wrapper functions. It is just a interim report, because
> I have some points I want to make clear, and/or need to fix.
> (Moreover, I've not touched code to convert return value of Python
> callback functions yet)
>
>
> The patch attached modifies 4 kind of input argment translations.
>
> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
> typemap(in) const char * MAY_BE_NULL; allows NULL
> These had done by using 'parse=' typemap modifier, however there is no
> PyArg_Parse() format unit to convert both of str and bytes in py3.
> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
> and use it in for new typemap definition.
>
> consideration:
> * For py2, my patch code uses svn_swig_py_string_to_cstring()
> - It isn't allow Unicode for input, however 's' and 'z' format units
> PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
> it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
> add code to conversion for py2)

Yes I think you should support Unicode in this case, but it turns out
you are most of the way there. If you just remove the IS_PY3
conditional, it will support unicode! The "PyBytes_*" and "PyStr_*"
functions are wrappers provided by the py3c library. The names point
to the concept that they target, and then map it appropriately in Py2
and Py3. So

PyBytes: Sequence of byte values, e.g. "raw data"
   In Py2: str
   In Py3: bytes

PyStr: Character data
   In Py2: Unicode
   In Py3: str

So in Py2, using both PyBytes_* and PyStr_* is not redundant, they
actually do map to different things. At first it can seem a bit
confusing that PyStr means Unicode in Py2 and str in Py3, since the
type in Py2 is called "str". But note in particular that the real
Python API for "String" objects is actually PyString. PyStr is
specifically made up by py3c to represent the concept of "string" and
not necessarily the object "String". PyBytes can just be used as is,
since bytes was added as an alias for str in Py2.7.

> - Difference of TypeError exception message. Pyrg_Parse() reports
> argment by argnum in Python wrapper function. However it can't
> determin in typemap definition code, so my patch code uses argment
> symbol instead.
> * For py3, it seems to need more kindly Exception message for
> UnicodeEncodeError, which can be caused if input str contains surrogate
> data (U+DC00 - U+DCFF).
> * test case for UnicodeEncodeError is needed
>
> (2) in core.i, typemap for svn_stream_write
>
> consideration:
> * As this typemap doesn't seems to accept Unicode on py2, there seems to
> be no regression like described in (1)
> * As this typemap only used for svn_stream_write wrapper which have only
> one char * arg, default UnicodeEncodeError message seems to be sufficient.
>
> (3) typemap(in) apr_hash_t * (for various types)
> These are using make_string_from_ob() for hash key string conversion,
> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
> make_svn_strinf_from_ob() for hash value conversion
>
> consideration:
> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
> but current implementation of conversion function doesn't allows. Isn't
> it needed? (I added test for this case, but disabled until it make clear)
> * test case for UnicodeEncodeError is needed (for both of hash key and
> hash value)

Yes this looks like a good catch, the generic conversion function
should handle the NULL/None case.

> (4) typemap(in) apr_array_header_t *STRINGLIST
> This typemap is using svn_swig_py_unwrap_string() through the function
> svn_swig_py_seq_to_array().
>
> consideration:
> * test case for UnicodeEncodeError is needed (for both of hash key and
> hash value)
>

And here is the start of my patch review:

Don't forget to put a brief description of the over overall change in
the log message here. And as this is such large change, a detail
section is probably also in order as well.

> [In subversion/bindings/swig]
>
> * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3,
> allow Str as well for data argment of svn_stream_write()
>
> * include/svn_global.swg
> (remove)(%typemap(in) char *, char const *, char * const,
> char const * const): Move this typemap into include/svn_strings as
> typemap (in) IN_STRING
>
[snip]
>
> diff --git a/subversion/bindings/swig/core.i b/subversion/bindings/swig/core.i
> index c309d48070..6993ef4c9c 100644
> --- a/subversion/bindings/swig/core.i
> +++ b/subversion/bindings/swig/core.i
> @@ -442,18 +442,31 @@
> */
> #ifdef SWIGPYTHON
> %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> - char *tmpdata;
> Py_ssize_t length;
> - if (!PyBytes_Check($input)) {
> - PyErr_SetString(PyExc_TypeError,
> - "expecting a bytes object for the buffer");
> - SWIG_fail;
> + if (PyBytes_Check($input)) {
> + if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) {
> + SWIG_fail;
> + }
> + }
> +%#if IS_PY3
> + else if (PyStr_Check($input)) {
> + $1 = PyStr_AsUTF8AndSize($input, &length);
> + if (PyErr_Occurred()) {
> + SWIG_fail;
> + }
> }
> - if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
> +%#endif

Don't use the IS_PY3 conditional here, and you get Unicode (on Py2)
conversion automatically.

> + else {
> + PyErr_SetString(PyExc_TypeError,
> +%#if IS_PY3
> + "expecting a bytes or str object for the buffer"
> +%#else
> + "expecting a string for the buffer"
> +%#endif

I think it is valid to say "bytes or string" object for both Py2 and
Py3, thus getting rid of another conditional.

> + );
> SWIG_fail;
> }
> temp = ($*2_type)length;
> - $1 = tmpdata;
> $2 = ($2_ltype)&temp;
> }
> #endif

[snip]

> diff --git a/subversion/bindings/swig/include/svn_string.swg b/subversion/bindings/swig/include/svn_string.swg
> index 0fc64ebdcc..8be4c3d746 100644
> --- a/subversion/bindings/swig/include/svn_string.swg
> +++ b/subversion/bindings/swig/include/svn_string.swg
> @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t;
> }
> #endif
>
> + /* -----------------------------------------------------------------------
> + Type: char * (input)
> +*/
> +#ifdef SWIGPYTHON
> +%typemap (in) IN_STRING
> +{
> + $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", "$1_name");
> + if (PyErr_Occurred()) SWIG_fail;
> +}
> +
> +%typemap (freearg) IN_STRING "";

What does freearg here do? Looking at the docs, it seems this supposed
to be used to free allocated resources once the wrapper function
exits.

> +
> +%apply IN_STRING {
> + const char *,
> + char *,
> + char const *,
> + char * const,
> + char const * const
> +};
> +#endif
> /* -----------------------------------------------------------------------
> define a way to return a 'const char *'
> */
> diff --git a/subversion/bindings/swig/include/svn_types.swg b/subversion/bindings/swig/include/svn_types.swg
> index 319f7daa6b..7c933b1ac7 100644
> --- a/subversion/bindings/swig/include/svn_types.swg
> +++ b/subversion/bindings/swig/include/svn_types.swg
> @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
> #ifdef SWIGPYTHON
> %typemap(in) const char *MAY_BE_NULL
> {
> - if ($input == Py_None) {
> - $1 = NULL;
> - } else {
> - $1 = PyBytes_AsString($input);
> - if ($1 == NULL) SWIG_fail;
> - }
> + $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", "$1_name");
> + if (PyErr_Occurred()) SWIG_fail;
> }
> #endif
>
> diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> index 8a4ec631be..58cfec30a3 100644
> --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t *error_chain)
>
> /*** Helper/Conversion Routines ***/
>
> +/* Function to get char * representation of bytes/str object */
> +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null,
> + const char * funcsym, const char * argsym)

I think we need to be very careful with the return value here, both
the type and the lifetime. The trouble with using the value returned
directly from the C api below is that they typically return a pointer
to an internal buffer which must not be modified. Returning it as
"char *" is probably not the right idea. Additionally, these are
internal to the "input" object, so if something like:

  PyObject *foo = <create Py String>;
  char *my val = svn_swig_py_string_to_cstring(foo, ...);
  Py_DECREF(foo);

  svn_awesome_api(my_val, ...); /* This could explode */

Of course, we do need to handle the case of an input "char *"
argument, so I think the best option is probably to use
"apr_pstrdup()" to allocate a new string from the application pool.

> +{
> + char *retval = NULL;
> + if (PyBytes_Check(input)) {
> + retval = PyBytes_AsString(input);
> + }
> +#if IS_PY3
> + else if (PyStr_Check(input)) {
> + retval = (char *)PyStr_AsUTF8(input);
> + }
> +#endif

Remove this IF_PY3 conditional as well.

> + else if (input != Py_None || ! maybe_null) {
> + PyErr_Format(PyExc_TypeError,
> +#if IS_PY3
> + "%s() argument %s must be bytes or str%s, not %s",
> +#else
> + "%s() argument %s must be string%s, not %s",
> +#endif
> + funcsym, argsym, maybe_null?" or None":"",
> + Py_TYPE(input)->tp_name);
> + }
> + return retval;
> +}
> +

[ As far as my review got for now]

I'll continue reviewing over the next few days, that is all the time I
have for tonight!

Troy
Received on 2019-01-14 05:10:28 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.