[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

(swig-py3) how to handle "invalid" property value?

From: Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>
Date: Tue, 11 Dec 2018 01:18:19 +0900

Hi,
I hear that property values may not be vaild UTF-8 string if they are
set with skipping validation.
(https://twitter.com/jun66j5/status/1067295499907084288 (written in
Japanese language); https://trac.edgewall.org/ticket/4321)

However, current swig-py3 typemaps to return those values use
PyStr_FromStringAndSize() to convert svn_string_t into Python's str object,
which raise UnicodeDecodeError for invalid UTF-8 octet sequence, so there
is no way to get those strict value.

To resolve this issue, there seems to be some some options:

(1). Those API always return str (Unicode) with 'strict' conversion.
     if error occured, abandon to get these values. (current implementation)
(2). Those API always return str with 'surrogateescape' conversion.
     if applicatin want to detect irregular data, search \u+dc00-\u+dcff
     character in values.
(3). Those API always return bytes. if applications want to handle as
     str, decode them in application side.
(4). Those API return str for valid data, and return bytes for invalid data
     to avoid missing way to get data.
(5). other (I have no idea..., though)

I think (2) or (3) is appropriate, but I don't have confidence.
Any ideas?

-- 
Yasuhito FUTATSUKI
Received on 2018-12-10 18:13:45 CET

This is an archived mail posted to the Subversion Dev mailing list.