Re: extending the blame callback

From: Branko ÄŒibej <brane_at_apache.org>
Date: Sun, 6 Jan 2019 19:37:14 +0100

On 06.01.2019 17:16, Stefan Kueng wrote:
>
>
> On 06.01.2019 15:05, Branko ÄŒibej wrote:
>
>> Sorry about getting into this late, but as Julian said:
>>
>>> * we already have a (char*, len) wrapper, called svn_string_t, so I
>>> would expect it would be neatest to use that;
>>
>> but the patch has:
>>
>>> @@ -758,6 +758,28 @@
>>> Â·*Â·willÂ·beÂ·trueÂ·ifÂ·theÂ·reasonÂ·thereÂ·isÂ·noÂ·blameÂ·informationÂ·isÂ·thatÂ·theÂ·line
>>>
>>>
>>> Â·*Â·wasÂ·modifiedÂ·locally.Â·InÂ·allÂ·otherÂ·casesÂ·@aÂ·local_changeÂ·willÂ·beÂ·false.
>>>
>>>
>>> Â·*
>>> +Â·*Â·@sinceÂ·NewÂ·inÂ·1.12.
>>> +Â·*/
>>> +typedefÂ·svn_error_tÂ·*(*svn_client_blame_receiver4_t)(
>>> +Â·Â·voidÂ·*baton,
>>> +Â·Â·svn_revnum_tÂ·start_revnum,
>>> +Â·Â·svn_revnum_tÂ·end_revnum,
>>> +Â·Â·apr_int64_tÂ·line_no,
>>> +Â·Â·svn_revnum_tÂ·revision,
>>> +Â·Â·apr_hash_tÂ·*rev_props,
>>> +Â·Â·svn_revnum_tÂ·merged_revision,
>>> +Â·Â·apr_hash_tÂ·*merged_rev_props,
>>> +Â·Â·constÂ·charÂ·*merged_path,
>>> +Â·Â·svn_stringbuf_tÂ·*line,
>>> +Â·Â·svn_boolean_tÂ·local_change,
>>> +Â·Â·apr_pool_tÂ·*pool);
>>
>>
>> The svn_stringbuf_t struct is not appropriate here; please use
>> svn_string_t. The former is used as a buffer to construct larger
>> strings, that's why it contains a reference to a pool and a blocksize.
>> the blame receiver gets data that are already allocated from a pool and
>> should be immutable to the receiver; the appropriate declaration here is
>>
>> Â Â Â Â const svn_string_t *line;
>
> good point.
>
>> as you only need the data pointer and the length, and very much do _not_
>> need the pool and blocksize, nor do you want the data or length to be
>> modifiable by the callback.
>>
>> That will make the changes in libsvn_client/blame.c a bit more complex
>> since you'll have to create a local svn_string_t object (on stack)
>> before calling the receiver, but it makes the receiver's interface
>> cleaner. Something like this would work:
>
> see attached patch.
>
>> Other than that, I think it's a bad idea to leave the trailing null byte
>> from the UTF-16-LE representation of U+000a at the beginning of the next
>> "line". If we're making this change in a *public* API, we should not
>> expect all users to hack around such a misfeature.
>
> problem is: if we just skip any null char after an lf, then we would
> screw up utf16be encodings (or le? I always get those two confused).

Windows default is UTF-16-LE, at least on x86(_64) and other
little-endian architectures. I'm not sure what they do on ARM but I'd be
surprised if Windows doesn't put it in little-endian mode, given that
decades of legacy software assume little-endian.

A simple check would be:

  * if 0x0a is on an odd offset, and the next byte is 0x00, then it's a
    UTF-16-LE linefeed;
  * else if 0x0a is on an even offset, and the _previous_ byte is 0x00,
    then it's a UTF-16-BE linefeed;
  * otherwise just hope it's a linefeed and move on.

But given your example below of mixing ASCII/UTF-8/UTF-16, it's better
to just leave well enough alone. :(

>> Until we have better support for other Unicode representations, we
>> should detect that null byte and include it as part of the reported
>> blame line.
>
> another situation which unfortunately is not that uncommon: files that
> have different encodings in different lines or even inside the same
> line. I've seen that especially in log files where the logger first
> prints timestamp in ascii, but then the log-text is in utf8

... which isn't a problem since ASCII and UTF-8 are compatible ...

> or even utf16.

... but this is a rather huge problem.

> Such situations I think would require too much detection logic for the
> blame function and should be left to clients - they might not be able
> to even show such lines so why bother trying to do the detection right
> for that.

You're right about that. I wouldn't dream of supporting such things
within the blame callback itself. However it would still be nice to at
least document what's happening.

The patch itself looks OK to me now.

-- Brane
Received on 2019-01-06 19:37:24 CET

This message: [ Message body ]
Next message: Daniel Shahaf: "Re: extending the blame callback"
Previous message: Stefan Kueng: "Re: extending the blame callback"
In reply to: Stefan Kueng: "Re: extending the blame callback"
Next in thread: Daniel Shahaf: "Re: extending the blame callback"
Reply: Daniel Shahaf: "Re: extending the blame callback"
Reply: Stefan Kueng: "Re: extending the blame callback"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]