[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Fri, 3 Feb 2012 04:59:01 +0900

2012/2/3 Peter Samuelson <peter_at_p12n.org>:
>
> [Hiroaki Nakamura]
>> In option (2), we do n12n on all clients on all platforms, and we
>> include web_dav_svn in "clients". So we convert all input paths to
>> the "server encoding", which is NFC.
>
> Indeed. But the very concept of a "server encoding" means we are
> involving the server side. Which invokes a lot of difficult questions
> like "what about existing 1.x clients", "what about existing checkouts"
> and "what about existing repositories".

Svn 1.7 forces me to upgrade existing 1.6 working copies.
So we can let users to upgrade working copies.

Existing repositories, I think it would be better to convert them too using
svndump/svnload. And we change svnload to convert filenames to NFC.
However in reality we cannot force users to convert every existing repository.
So we need to change servers too. When servers read filenames
from repositories, they first convert to NFC and then process commands.

We also need to changes servers in order to deal with existing 1.x clients.
We convert filenames to NFC when web_dav_svn and svnserve
receive filenames from clients, they must first convert filenames to NFC.

>
> By proposing a client-only solution, I hope to avoid _all_ those
> questions. (Except "what about existing checkouts" - there would be a
> wc upgrade of some sort.) No recoding of existing repository paths is
> necessary. In my proposal, the only recoding that is done is on the
> client side, on a platform that does not support the original pathname
> (e.g., OS X HFS+ with a NFC path).
>
>> "All problems in computer science can be solved by another level of
>> indirection."
>
> Mostly true. I can't tell if you quoted that as a point of support for
> my proposal, or as a point against it.
>
>> Yes, with the mapping table, you can mangle filenames. However I
>> think it is too complex for novice users. Users must care about the
>> original filenames and the mangled filenames all the time.
>
> Well, there is no need to use this same proposal to also work around
> other filesystem limitations like avoiding ":" on Windows. It is just
> something that becomes _possible_.
>
>> Also you must adapt all clients to use the mapping table. That is
>> whole lot of work! Maybe you would create another version control
>> system.
>
> By "all clients" I guess you mean "all Subversion client libraries".
> Yes, that is the proposal. It would touch libsvn_wc and probably
> libsvn_client and libsvn_subr.

Yes, like I said above, "clients" actually includes components that
run on servers like web_dav_svn, and it should read as any components
that access to repositories and working copies.

We also need to change svnserve. So we'd better say "all servers and clients".

>
>> So even if Windows NTFS can have the same abstract filenames in both
>> NFC and NFD simultaneously, we should avoid that, and we should only
>> allow NFC filenames.
>
> This could be done, if we wanted to go to the trouble. Or we could
> just say "use a pre-commit hook," like we tell people who want to
> prevent REAMDE and Reamde in a single dir. It is not the same level of
> interoperability problem as the one this thread is about.

If you think in analogy to ASCII uppercase and lowercase examples,
you miss the point. Please reread the Unicode Standard Annex #15
UAX #15: Unicode Normalization Forms
http://unicode.org/reports/tr15/

> Canonical equivalence is a fundamental equivalency between
> characters or sequences of characters that represent the same
> abstract character, and when correctly displayed should always
> have the same visual appearance and behavior. Figure 1 illustrates
> this equivalence.

So, filenames in NFC and NFD are the equivalent, the same.
README and readme are different.
NFC/NFD and uppercase/lowercase are two different stories.

Should we allow the same filenames in one directory?
Of course not! If we allow that we go into really trouble and
confusion.

And OS X HSF+ does not allow that. So to support interoperability
to OS X, we should not allow it in subversion too.

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-02 20:59:34 CET

This is an archived mail posted to the Subversion Dev mailing list.