[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Tue, 7 Feb 2012 22:30:09 +0900

2012/2/7 Branko Čibej <brane_at_apache.org>:
> On 06.02.2012 22:26, Hiroaki Nakamura wrote:
>> The Unicode Standard says canonical equivalent sequences should be
>> interpreted the same way.
>> * 1.1 Canonical and Compatibility Equivalence
>>   http://unicode.org/reports/tr15/#Canonical_Equivalence
>> * 2.12 Equivalent Sequences and Normalization
>>   http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf
>>
>> So we should not have the same name multiple times in repositories
>> and working copies. Therefore subversion servers and clients does
>> not need to handle them.
>
> *sigh*
>
> I don't give a gnat's whisker what the Unicode Standard says. I'm only
> interested in real-world situations. Or are you implying that, e.g., the
> Unix VFS layer will magically detect file name equality of different
> (de)normalized forms? Because it won't.
>
> -- Brane
>

I'm interested in real-world situations, too. It is the reality that
we need to avoid the same filenames in different forms because
they confuse users so much.

I don't think we expect file systems detect filename equality of
different forms. Mac OS X HFS+ can have only NFD filenames
and we must cope with it. And as you say, standard file systems
in Linux and Windows does not magically detect file name equality
of different forms. Also It's the reality we cannot force users to format
their harddisks and change file systems.

So communication layer must take care of this problem to provide
interoperability among Windows, Linux and Mac.
Subversion to the rescue!

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-07 14:30:43 CET

This is an archived mail posted to the Subversion Dev mailing list.