On 2020/05/15 10:32, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Thu, 14 May 2020 21:44 +00:00:
>> Thank you for the review and lesson on my commit message. I intended
>> to send this commit message with the patch before commit, but missed
>> by mistakes.
>> I added note about quotation characters which could also cause SyntaxError.
>> Before replace this log message, I'd like to get a review again.
> Thanks for your diligence.
>> entries-dump: Escape string-typed attribute values when serializing
>> them as Python string literals.
>> Before this commit, a filesystem node named "foo\bar" (a single,
>> 7-character path component) would cause "e.name = 'foo\bar'" to be
>> emitted. The unescaped backslash would manifest as a test failure or
>> a SyntaxError, depending on the following characters.
> *nod* Changing the Unicode quotes to double quotes is fine.
> (I generally use Unicode quotes so it's clear which quotes delimit code
> from English prose and which quotes are literal parts of the code.
> However, that's just me.)
>> Also, user names can contain "'" (a single quote character) and/or """
>> (a double quote character), which would potentially cause a SyntaxError
>> even if we choose ether of them to quote string literals.
> Something is not clear to me in this paragraph. I do see that if we had
> written «printf("'%s'", value)», SyntaxError's could still have resulted;
> however, I don't see how that is a reason not to choose single quotes.
> We _could_ still have chosen single quotes if we had escaped backslashes,
> single quotes, and newlines in «value» before printing it, couldn't we?
> It's not clear to me what this paragraph is trying to explain: whether
> it's trying to explain why the code generates a «bytes» object as
> opposed to a «str» object, or to explain the choice to escape every byte
> (even those that don't _require_ escaping according to Python's bytes
> literals syntax), or something else.
> Also, s/ether/either/.
Thanks for the review, again. I intended to to explain there can be more
case that we should use escape expression, and I'm sure it wasn't
entries-dump: Escape string-typed attribute values when serializing
them as Python string literals.
Before this commit, a filesystem node named "foo\bar" (a single,
7-character path component) would cause "e.name = 'foo\bar'" to be
emitted. The unescaped backslash would manifest as a test failure or
a SyntaxError, depending on the following characters.
This was triggered by update_tests.py 76 windows_update_backslash under
Python 3 on Windows.
There can be some other characters that should be escaped. For example,
user names can contain "'" (a single quote character) and/or """ (a
double quote character), which would potentially cause a SyntaxError
even if we choose either of them to quote string literals. To avoid to
overlook such potentially unsafe characters, I decide to use hex value
escape for all characters.
Furthermore, to ensure that values are decoded to Unicode as UTF-8 byte
sequences when we use hex value escape under Python 3, we print them as
bytes value and then encode it.
(print_prefix): New function.
- Add argument to specify pool.
- Print human readable value of "value" as is in comment, then set it
as str value by using hex escaped bytes literal.
(entries_dump): Add pool argument to str_value() calls.
- Print "Entry" class definition as prefix before entry_dump() or tree_dump()
- Style fix on if statement (using blocks).
(): Add include files for assert() and svn_xml_escape_attr_cstring()
(run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
into generated code by entries-dump execution.
Found by: svn-windows-ra buildbot
Review by: danielsh
Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>/<futatuki_at_yf.bsdclub.org>
Received on 2020-05-17 21:54:53 CEST