Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Sun, 20 Jan 2019 17:50:26 +0900

Patch updated.

In article <CAEZNtZ+SKqiATLwDvzyVf=SULPGhT7usxhZEawNfextPWsbEag_at_mail.gmail.com>
troycurtisjr_at_gmail.com writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <futatuki_at_yf.bsdclub.org> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>> typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>> - It isn't allow Unicode for input, however 's' and 'z' format units
>> PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>> it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>> add code to conversion for py2)
>
>
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode! The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library. The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
>
> PyBytes: Sequence of byte values, e.g. "raw data"
> In Py2: str
> In Py3: bytes
>
> PyStr: Character data
> In Py2: Unicode
> In Py3: str

Now it uses PyUnicode_Check() and PyStr_UTF8(), then it works in py2 and py3
without IS_PY3 conditional here.

>> - Difference of TypeError exception message. Pyrg_Parse() reports
>> argment by argnum in Python wrapper function. However it can't
>> determin in typemap definition code, so my patch code uses argment
>> symbol instead.

This is still left, if we treat it as a regression.

>> * For py3, it seems to need more kindly Exception message for
>> UnicodeEncodeError, which can be caused if input str contains surrogate
>> data (U+DC00 - U+DCFF).

I reconsidered it and abandoned, because of UnicodeEncodeError structure.
It is enought to analyse why it causes.

> * test case for UnicodeEncodeError is needed

Added.

>> (2) in core.i, typemap for svn_stream_write
>>
>> consideration:
>> * As this typemap doesn't seems to accept Unicode on py2, there seems to
>> be no regression like described in (1)
>> * As this typemap only used for svn_stream_write wrapper which have only
>> one char * arg, default UnicodeEncodeError message seems to be sufficient.

Not changed in this patch.

>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>> but current implementation of conversion function doesn't allows. Isn't
>> it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>> hash value)
>
>
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.

I've fixed it. To fix this, check key and value separately in
svn_swig_py_*_from_dict().

Test case for UnicodeEncodeError is added, and it detect make_svn_strinf_from_ob()
couldn't handle this case :) Yes, I've fixed it, of course.

>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>> hash value)

Test case for UnicodeEncodeError is added in tests/client.py.

> And here is the start of my patch review:

I've intended to fix problems pointed out, except the comment for
svn_swig_py_string_to_cstring(). For svn_swig_py_string_to_cstring(),
I've added a comment for the reason of return type and why it seems to
sufficient not to duplicate string entity.

Thanks,

-- 
Yasuhito FUTATSUKI

text/plain attachment: swig-py-char-ptr-input-revised-patch.txt

Received on 2019-01-20 09:53:22 CET

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: SVN 1.10 AuthZ file parsing too strict?!"
Previous message: Branko ÄŒibej: "Re: SVN 1.10 AuthZ file parsing too strict?!"
In reply to: Yasuhito FUTATSUKI: "Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]