[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [Issue 2194] Unicde UTF-16 files detected as binary

From: Branko Čibej <brane_at_xbc.nu>
Date: 2005-01-09 21:53:49 CET

Barry Scott wrote:

> On Jan 6, 2005, at 01:41, Branko Čibej wrote:
>> Peter N. Lundblad wrote:
>>> On Wed, 5 Jan 2005, Max Bowsher wrote:
>>>> Peter N. Lundblad wrote:
>>>> I agree with what you are saying, but what 2194 was saying was "UTF-16
>>>> should be detected as textual".
>>> Yes, it is more complicated than that, since it is an enconding where a
>>> line break is not one or two bytes, and for some other reasons.
>>> Still, I
>>> think we really need to support other Unicode encodings thatn UTF8,
>>> like
>>> we support other 8-bit encodings.
>> It is much more complicated than that. If we're to treat UTF-16 files
>> as text, we have to teach libsvn_diff to do diffs and merges
>> correctly on such files, and possibly enhance keyword expansion and
>> newline conversion, too.
>> In short, it's a whole can of worms that probably affects 90% of the
>> client-side code.
> When the rewrite of the client eventually happens design wide char
> support in on day 1 then.

This won't help in general. You can only guarantee identical conversions
between the various Unicode encodings, but if the file is in some other
encoding, there's not always a valid way to convert the contents to
Unicode, operate on that, and convert back without changing some of the
original characters that shouldn't have changed. For example, the
various ISO-2022 encodings are notorious for not behaving nicely in this
context, and for that matter so is UTF-7.

The only universally correct way is to find the replaceable strings
*without* converting the file contents, then only convert the
replacements once from Unicode to the file's encoding.

> I do not expect a quick fix, but this issue should be nagging at svn
> devos.

Not to worry, it's in the issue tracker. :-)

-- Brane

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jan 9 21:54:02 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.