Re: [Issue 2194] Unicde UTF-16 files detected as binary

From: Branko Čibej <brane_at_xbc.nu>
Date: 2005-01-09 21:53:49 CET

Barry Scott wrote:

>
> On Jan 6, 2005, at 01:41, Branko Čibej wrote:
>
>> Peter N. Lundblad wrote:
>>
>>> On Wed, 5 Jan 2005, Max Bowsher wrote:
>>>
>>>
>>>> Peter N. Lundblad wrote:
>>>> I agree with what you are saying, but what 2194 was saying was "UTF-16
>>>> should be detected as textual".
>>>>
>>>>
>>> Yes, it is more complicated than that, since it is an enconding where a
>>> line break is not one or two bytes, and for some other reasons.
>>> Still, I
>>> think we really need to support other Unicode encodings thatn UTF8,
>>> like
>>> we support other 8-bit encodings.
>>>
>> It is much more complicated than that. If we're to treat UTF-16 files
>> as text, we have to teach libsvn_diff to do diffs and merges
>> correctly on such files, and possibly enhance keyword expansion and
>> newline conversion, too.
>>
>> In short, it's a whole can of worms that probably affects 90% of the
>> client-side code.
>
>
> When the rewrite of the client eventually happens design wide char
> support in on day 1 then.

This won't help in general. You can only guarantee identical conversions
between the various Unicode encodings, but if the file is in some other
encoding, there's not always a valid way to convert the contents to
Unicode, operate on that, and convert back without changing some of the
original characters that shouldn't have changed. For example, the
various ISO-2022 encodings are notorious for not behaving nicely in this
context, and for that matter so is UTF-7.

The only universally correct way is to find the replaceable strings
*without* converting the file contents, then only convert the
replacements once from Unicode to the file's encoding.

> I do not expect a quick fix, but this issue should be nagging at svn
> devos.

Not to worry, it's in the issue tracker. :-)

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jan 9 21:54:02 2005

This message: [ Message body ]
Next message: Nicolás Lichtmaier: "Re: Feature Request: clients shouldn't store auth-creds"
Previous message: Branko Čibej: "Re: Revised Proposal: Improved locking implementation for fsfs"
In reply to: Barry Scott: "Re: [Issue 2194] Unicde UTF-16 files detected as binary"
Next in thread: Peter N. Lundblad: "Re: [Issue 2194] Unicde UTF-16 files detected as binary"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]