On 06.01.2019 19:37, Branko Čibej wrote:
> Windows default is UTF-16-LE, at least on x86(_64) and other
> little-endian architectures. I'm not sure what they do on ARM but I'd be
> surprised if Windows doesn't put it in little-endian mode, given that
> decades of legacy software assume little-endian.
>
> A simple check would be:
>
> * if 0x0a is on an odd offset, and the next byte is 0x00, then it's a
> UTF-16-LE linefeed;
> * else if 0x0a is on an even offset, and the _previous_ byte is 0x00,
> then it's a UTF-16-BE linefeed;
> * otherwise just hope it's a linefeed and move on.
looks good for proper files. But I have two files right here that are
utf8 encoded, but have also null bytes in it: the stupid app that writes
those uses the null byte to separate lines, and lf to separate
paragraphs. So for these files the check would fail.
>
> You're right about that. I wouldn't dream of supporting such things
> within the blame callback itself. However it would still be nice to at
> least document what's happening.
I'll add some more info to the doc string and resend the patch tomorrow.
The advantage if this is done in an UI client: if the detection of the
encoding is wrong, I can just add a button/combobox/whatever so the user
can choose the right encoding right there when showing the blame output.
If the detection is done in the svn lib and is wrong, then an UI client
could not do that.
Stefan
Received on 2019-01-06 20:11:07 CET