[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Branko Čibej <brane_at_apache.org>
Date: Tue, 07 Feb 2012 14:43:19 +0100

On 07.02.2012 14:30, Hiroaki Nakamura wrote:
> 2012/2/7 Branko Čibej <brane_at_apache.org>:
>> On 06.02.2012 22:26, Hiroaki Nakamura wrote:
>>> The Unicode Standard says canonical equivalent sequences should be
>>> interpreted the same way.
>>> * 1.1 Canonical and Compatibility Equivalence
>>> http://unicode.org/reports/tr15/#Canonical_Equivalence
>>> * 2.12 Equivalent Sequences and Normalization
>>> http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf
>>> So we should not have the same name multiple times in repositories
>>> and working copies. Therefore subversion servers and clients does
>>> not need to handle them.
>> *sigh*
>> I don't give a gnat's whisker what the Unicode Standard says. I'm only
>> interested in real-world situations. Or are you implying that, e.g., the
>> Unix VFS layer will magically detect file name equality of different
>> (de)normalized forms? Because it won't.
>> -- Brane
> I'm interested in real-world situations, too. It is the reality that
> we need to avoid the same filenames in different forms because
> they confuse users so much.
> I don't think we expect file systems detect filename equality of
> different forms. Mac OS X HFS+ can have only NFD filenames
> and we must cope with it. And as you say, standard file systems
> in Linux and Windows does not magically detect file name equality
> of different forms. Also It's the reality we cannot force users to format
> their harddisks and change file systems.
> So communication layer must take care of this problem to provide
> interoperability among Windows, Linux and Mac.
> Subversion to the rescue!

I agree with all of that. The point I was trying to make, and which
Stefan spelled out a lot better, is that the existing MacPorts/Homebrew
patch is not a real solution (that's despite the fact that I use it
myself). The client-side mapping table is a more general solution, if a
lot harder to implement.

But it brings additional benefits in that we could use it to, e.g.,
transliterate characters that are allowed by some file systems, but not
by others; for example, on Unix, file names may contain colons, but they
can't on Windows. We could even use the mapping table to decorate local
files that differ only in case on case-insensitive file systems.

-- Brane
Received on 2012-02-07 14:43:58 CET

This is an archived mail posted to the Subversion Dev mailing list.