On 2020/10/05 1:57, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:
>> On 2020/09/26 19:12, Daniel Shahaf wrote:
>>> 1 % svn propset svn:ignore "予定表.txt" ./
>>> 2 property 'svn:ignore' set on '.'
>>> 3 % svn propset foo:ignore "予定表.txt" ./
>>> 4 property 'foo:ignore' set on '.'
>>> 5 % LC_ALL=ja_JP.eucjp svn pl -v
>>> 6 Properties on '.':
>>> 7 foo:ignore
>>> 8 予定表.txt
>>> 9 svn:ignore
>>> 10 ͽɽ.txt
>>>
>>> 11 % LC_ALL=C svn pg --strict svn:ignore
>>> 12 {U+4E88}{U+5B9A}{U+8868}.txt
>>>
>>> 13 % svn propset svn:ignore "{U+4E88}.txt" ./
>>> 14 property 'svn:ignore' set on '.'
>>> 15 % sqlite3 .svn/wc.db .dump | me
>>> 16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
>>> 17 % svn pg --strict svn:ignore
>>> 18 {U+4E88}{U+5B9A}{U+8868}.txt
>>> .
>>> So, I think there are a number of different issues/gotchas here:
>>>
>>> - It's not possible to get the raw value of an svn:* property in
>>> a working copy if the value is not representable in the local encoding.
>>
>> I belive that if we want to get property values precisely, we should
>> use xml output, although --no-newline is enough in most case except
>> this case.
>
> Hmm, that's an interesting one. On the one hand, «propget --xml»
> does resolve the ambiguity issue of the ad-hoc escaping; on the other
> hand:
>
> - We shouldn't require CLI users to use an XML parser in order to
> retrieve values of binary blobs.
Then do we need a new output format for "strict" values?
> - The XML document declares itself to be in UTF-8. Does that mean XML
> parsers are allowed to treat the dumped property values as UTF-8 and,
> for example, convert the byte sequence (that comprises the value) to
> another byte sequence, that's equivalent when treated as UTF-8 but
> not equivalent when treated as binary blobs? (For example, convert
> the UTF-8 to composed or decomposed normal form.)
At least we expect there is no conversion of byte sequence on parsing,
if the value is considered to be safe by svn_xml_is_xml_safe(). If it
is not so, I think outputs of --xml is broken.
Moreover, as properties have no meta data about its contents, we can't
determine a property is a text or not even if it contains only printable
characters, like 'eicar.com'[1]. So it is not so curious even if we might
use base64 encoding for all properties (but I don't think it is good
idea).
[1] https://svn.haxx.se/dev/archive-2016-03/0043.shtml
(Yes, I was also trapped by it yesterday.)
Cheers,
--
Yasuhito FUTATSUKI <futatuki_at_yf.bsclub.org>
Received on 2020-10-05 15:30:57 CEST