On 2020/05/15 10:32, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Thu, 14 May 2020 21:44 +00:00:
>> Thank you for the review and lesson on my commit message. I intended
>> to send this commit message with the patch before commit, but missed
>> by mistakes.
>>
>> I added note about quotation characters which could also cause SyntaxError.
>>
>> Before replace this log message, I'd like to get a review again.
>>
>
> Thanks for your diligence.
>
>> [[[
>> entries-dump: Escape string-typed attribute values when serializing
>> them as Python string literals.
>>
>> Before this commit, a filesystem node named "foo\bar" (a single,
>> 7-character path component) would cause "e.name = 'foo\bar'" to be
>> emitted. The unescaped backslash would manifest as a test failure or
>> a SyntaxError, depending on the following characters.
>>
>
> *nod* Changing the Unicode quotes to double quotes is fine.
>
> (I generally use Unicode quotes so it's clear which quotes delimit code
> from English prose and which quotes are literal parts of the code.
> However, that's just me.)
>
>> Also, user names can contain "'" (a single quote character) and/or """
>> (a double quote character), which would potentially cause a SyntaxError
>> even if we choose ether of them to quote string literals.
>
> Something is not clear to me in this paragraph. I do see that if we had
> written «printf("'%s'", value)», SyntaxError's could still have resulted;
> however, I don't see how that is a reason not to choose single quotes.
> We _could_ still have chosen single quotes if we had escaped backslashes,
> single quotes, and newlines in «value» before printing it, couldn't we?
>
> It's not clear to me what this paragraph is trying to explain: whether
> it's trying to explain why the code generates a «bytes» object as
> opposed to a «str» object, or to explain the choice to escape every byte
> (even those that don't _require_ escaping according to Python's bytes
> literals syntax), or something else.
>
> Also, s/ether/either/.
Thanks for the review, again. I intended to to explain there can be more
case that we should use escape expression, and I'm sure it wasn't
achieved.
[[[
entries-dump: Escape string-typed attribute values when serializing
them as Python string literals.
Before this commit, a filesystem node named "foo\bar" (a single,
7-character path component) would cause "e.name = 'foo\bar'" to be
emitted. The unescaped backslash would manifest as a test failure or
a SyntaxError, depending on the following characters.
This was triggered by update_tests.py 76 windows_update_backslash under
Python 3 on Windows.
There can be some other characters that should be escaped. For example,
user names can contain "'" (a single quote character) and/or """ (a
double quote character), which would potentially cause a SyntaxError
even if we choose either of them to quote string literals. To avoid to
overlook such potentially unsafe characters, I decide to use hex value
escape for all characters.
Furthermore, to ensure that values are decoded to Unicode as UTF-8 byte
sequences when we use hex value escape under Python 3, we print them as
bytes value and then encode it.
* subversion/tests/cmdline/entries-dump.c
(print_prefix): New function.
(str_value):
- Add argument to specify pool.
- Print human readable value of "value" as is in comment, then set it
as str value by using hex escaped bytes literal.
(entries_dump): Add pool argument to str_value() calls.
(main):
- Print "Entry" class definition as prefix before entry_dump() or tree_dump()
- Style fix on if statement (using blocks).
(): Add include files for assert() and svn_xml_escape_attr_cstring()
* subversion/tests/cmdline/svntest/main.py
(run_entiresdump, run_entriesdump_tree): Move definition of "Entry" class
into generated code by entries-dump execution.
Found by: svn-windows-ra buildbot
Review by: danielsh
]]]
Thanks,
--
Yasuhito FUTATSUKI <futatuki_at_poem.co.jp>/<futatuki_at_yf.bsdclub.org>
Received on 2020-05-17 21:54:53 CEST