[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: extending the blame callback

From: Branko Čibej <brane_at_apache.org>
Date: Mon, 7 Jan 2019 06:07:48 +0100

On 06.01.2019 20:10, Stefan Kueng wrote:
>
>
> On 06.01.2019 19:37, Branko Čibej wrote:
>
>> Windows default is UTF-16-LE, at least on x86(_64) and other
>> little-endian architectures. I'm not sure what they do on ARM but I'd be
>> surprised if Windows doesn't put it in little-endian mode, given that
>> decades of legacy software assume little-endian.
>>
>> A simple check would be:
>>
>>    * if 0x0a is on an odd offset, and the next byte is 0x00, then it's a
>>      UTF-16-LE linefeed;
>>    * else if 0x0a is on an even offset, and the _previous_ byte is 0x00,
>>      then it's a UTF-16-BE linefeed;
>>    * otherwise just hope it's a linefeed and move on.
>
> looks good for proper files. But I have two files right here that are
> utf8 encoded, but have also null bytes in it: the stupid app that
> writes those uses the null byte to separate lines, and lf to separate
> paragraphs. So for these files the check would fail.

But not any more drastically than it fails now, surely? Since it returns
whole paragraphs instead of lines. You might lose an empty line in the
blame output here and there, hardly noticeable. :)

>> You're right about that. I wouldn't dream of supporting such things
>> within the blame callback itself. However it would still be nice to at
>> least document what's happening.
>
> I'll add some more info to the doc string and resend the patch tomorrow.
>
> The advantage if this is done in an UI client: if the detection of the
> encoding is wrong, I can just add a button/combobox/whatever so the
> user can choose the right encoding right there when showing the blame
> output.
> If the detection is done in the svn lib and is wrong, then an UI
> client could not do that.

OK, I guess, for now ... until we figure out how to do this right. For
example, when someone finally decides to properly handle Unicode
representations in file contents for diff, patch and blame, we'd also
have to support U+2028 (line separator) and U+2029 (paragraph separator).

-- Brane
Received on 2019-01-07 06:08:03 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.