On 06.01.2019 15:05, Branko Čibej wrote:
> Sorry about getting into this late, but as Julian said:
>
>> * we already have a (char*, len) wrapper, called svn_string_t, so I would expect it would be neatest to use that;
>
> but the patch has:
>
>> @@ -758,6 +758,28 @@
>> ·*·will·be·true·if·the·reason·there·is·no·blame·information·is·that·the·line
>>
>> ·*·was·modified·locally.·In·all·other·cases·@a·local_change·will·be·false.
>>
>> ·*
>> +·*·@since·New·in·1.12.
>> +·*/
>> +typedef·svn_error_t·*(*svn_client_blame_receiver4_t)(
>> +··void·*baton,
>> +··svn_revnum_t·start_revnum,
>> +··svn_revnum_t·end_revnum,
>> +··apr_int64_t·line_no,
>> +··svn_revnum_t·revision,
>> +··apr_hash_t·*rev_props,
>> +··svn_revnum_t·merged_revision,
>> +··apr_hash_t·*merged_rev_props,
>> +··const·char·*merged_path,
>> +··svn_stringbuf_t·*line,
>> +··svn_boolean_t·local_change,
>> +··apr_pool_t·*pool);
>
>
> The svn_stringbuf_t struct is not appropriate here; please use
> svn_string_t. The former is used as a buffer to construct larger
> strings, that's why it contains a reference to a pool and a blocksize.
> the blame receiver gets data that are already allocated from a pool and
> should be immutable to the receiver; the appropriate declaration here is
>
> const svn_string_t *line;
good point.
> as you only need the data pointer and the length, and very much do _not_
> need the pool and blocksize, nor do you want the data or length to be
> modifiable by the callback.
>
> That will make the changes in libsvn_client/blame.c a bit more complex
> since you'll have to create a local svn_string_t object (on stack)
> before calling the receiver, but it makes the receiver's interface
> cleaner. Something like this would work:
see attached patch.
> Other than that, I think it's a bad idea to leave the trailing null byte
> from the UTF-16-LE representation of U+000a at the beginning of the next
> "line". If we're making this change in a *public* API, we should not
> expect all users to hack around such a misfeature.
problem is: if we just skip any null char after an lf, then we would
screw up utf16be encodings (or le? I always get those two confused).
> Until we have better support for other Unicode representations, we
> should detect that null byte and include it as part of the reported
> blame line.
another situation which unfortunately is not that uncommon: files that
have different encodings in different lines or even inside the same
line. I've seen that especially in log files where the logger first
prints timestamp in ascii, but then the log-text is in utf8 or even
utf16. Such situations I think would require too much detection logic
for the blame function and should be left to clients - they might not be
able to even show such lines so why bother trying to do the detection
right for that.
Stefan
Received on 2019-01-06 17:23:52 CET