[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Some issues on svn propget (Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py)

From: Yasuhito FUTATSUKI <futatuki_at_yf.bsdclub.org>
Date: Sun, 4 Oct 2020 21:56:42 +0900

I changed a subject to let it be noticeable.

(Sorry for reply too late.)

On 2020/09/26 19:12, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900:
>> On 2020/09/25 16:06, Daniel Shahaf wrote:
>>> futatuki_at_apache.org wrote on Thu, 24 Sep 2020 17:06 -0000:
>>>> Author: futatuki
>>>> Date: Thu Sep 24 17:06:39 2020
>>>> New Revision: 1881985
>>>>
>>>> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
>>>> Log:
>>>> Follow up to r1880192: Fix an EOL issue in test on Windows.
>>>>
>>>> * subversion/tests/cmdline/merge-tests.py
>>>> (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
>>>> in expected values of svn:mergeinfo.
>>>>
>>>> Modified:
>>>> subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>>
>>>> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
>>>> ==============================================================================
>>>> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
>>>> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
>>>> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
>>>> )
>>>>
>>>> # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
>>>> +
>>>> + # NOTE: When writing out multi-line prop values in svn:* props, the
>>>> + # client converts to local encoding and local eol style.
>>>> + # Therefore, the expected output must contain the right kind of eoln
>>>> + # strings. That's why we use os.linesep in the tests below, not just
>>>> + # plain '\n'. The _last_ \n is also from the client, but it's not
>>>> + # part of the prop value and it doesn't get converted in the pipe.
>>
>> Before answer the questions, the comment above was brougt from
>> prop_value_conversion() in prop_tests.py.
>>
>
> *nod*
>
>>> I'm confused.
>>>
>>> First, if the client applies any transformation at all to property
>>> values, that'd be a bug. Property values are opaque octet sequences,
>>> just like file contents, so they must be emitted verbatim. It so
>>> happens that values of svn:* properties are UTF-8 with LF line endings,
>>> so that's what Python should see, regardless of the local encoding and
>>> EOL style.
>>
>> I judged that EOL conversion of property values on 'svn pg' isn't bug
>> because this comment is found in test for it.
>
> If you mean prop_value_conversion(), that test isn't about EOL style
> and encodings at all. It's about normalization of the values of some
> svn:* properties, such as this:
> .
> % svn ps -q svn:executable yes iota
> % svn pg svn:executable iota
> *
> %
>
> The test for binary property values is prop_tests.py:binary_props(). It
> doesn't cover newlines.
>
>> If it is a bug, prop_tests.py also does not test correctly.
>
> I've looked further and I think it's working as designed, but the
> design is rather unintuitive.
>
> In the data model, property values are binary data. That's why, in
> the API, property hashes' value type is svn_string_t. However, the
> cmdline client doesn't print all properties as binary data; it prints
> svn:* properties as text, transcoded and EOL-converted, even as it
> prints other properties in binary:
> .
> https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50
>
> The code still exists in trunk_at_HEAD, in propget-cmd.c.
>
> These semantics mean prop_tests.py and merge_tests.py are both correct
> to expect os.linesep in the output.

I've commited fix of the comment only, in 1882105 on last Tuesday.
 
> However, I find these semantics rather unintuitive. Imagine someone
> using a foo:ignore property as staging for svn:ignore:
>
> 1 % svn propset svn:ignore "予定表.txt" ./
> 2 property 'svn:ignore' set on '.'
> 3 % svn propset foo:ignore "予定表.txt" ./
> 4 property 'foo:ignore' set on '.'
> 5 % LC_ALL=ja_JP.eucjp svn pl -v
> 6 Properties on '.':
> 7 foo:ignore
> 8 予定表.txt
> 9 svn:ignore
> 10 ͽɽ.txt
> .
> The same sequence of bytes is emitted differently depending on what
> property it belongs to — once in binary (foo:ignore), once in the local
> encoding (svn:ignore), within the same list (!). (My terminal was in
> UTF-8 mode the whole time.)
>
> 11 % LC_ALL=C svn pg --strict svn:ignore
> 12 {U+4E88}{U+5B9A}{U+8868}.txt
> .
> There is no way in the cmdline client to get the value of an svn:*
> property that's not representable in the local encoding; in part
> because…
>
> 13 % svn propset svn:ignore "{U+4E88}.txt" ./
> 14 property 'svn:ignore' set on '.'
> 15 % sqlite3 .svn/wc.db .dump | me
> 16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
> 17 % svn pg --strict svn:ignore
> 18 {U+4E88}{U+5B9A}{U+8868}.txt
> .
> …the cmdline client's output is not unambiguously parseable. (These
> commands were run in a UTF-8 locale.)
>
> [On line 15 I grepped for the property skel (line 16)'s hexdump
> representation — that's how sqlite3 dumps render BLOB columns — and
> manually extracted and converted it. E.g., to find "4E88", I'd grep
> for its ASCII hex representation, "34453838".]
>
> On the other hand, for «propset» there is a --encoding=UTF-8 flag that
> allows property values not representable in the current encoding to be
> set. (The value still undergoes EOL conversion. I haven't checked
> whether it undergoes composition or decomposition.) However, one still
> has to know that «svn propset foo:ignore -F» takes binary input while
> «svn propset svn:ignore -F» takes input in the local encoding.
>
> So, I think there are a number of different issues/gotchas here:
>
> - prop_tests.py:binary_props() should cover property values that contain
> bytes values such as 0x0A, 0x0D, and 0x80-0x8F (newlines and non-ASCII).

Sure.
 
> - Getting the value of an svn:* property is different to getting the
> value of any other property, in both «propget» and «proplist».

Also it is not explicitly tested in prop_tests.py especially about
transcoding, although it is troublesome to check the transcode about
them because the test suite don't pass environment variables explicitly
to subprocess and there is no warranty that the system running the test
have locale other than "C".

> - It's not possible to get the raw value of an svn:* property in
> a working copy if the value is not representable in the local encoding.

I belive that if we want to get property values precisely, we should
use xml output, although --no-newline is enough in most case except
this case.
 
> - The {U+hhhh} fallback encoding is ambiguous. (And in any case,
> I guess scripts are going to have to roll their own parsing if they
> want to undo this escaping?)

I also think non-xml output is not for scripts, but mainly visualize.

> - Setting the value of an svn:* property is different to setting the
> value of any other property, and there will not necessarily be an
> error message. (For instance, imagine extracting an svn:ignore value
> from a dumpfile into a temporary file and then using «svn propset -F»
> on the temporary file, while the local encoding is ISO-8859-1. That
> will silently succeed.)

Indeed, it is not intuitive.
 
> - The remark in prop_value_conversion() about a 'last \n' has bitrotted.
> [It should have been removed in r845498 (= r5424) — which,
> coincidentally, is also the revision in which binary_props() was
> added.]

I agreed, but did not touched prop_tests.py in 1882105.

Cheers,

-- 
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>
Received on 2020-10-04 14:58:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.