[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Peter Samuelson <peter_at_p12n.org>
Date: Mon, 30 Jan 2012 17:14:18 -0600

[Stefan Sperling]
> It is indeed harder because we are passing paths verbatim to sqlite.
> I doubt having more than one form of a given path in wc.db is fun...

That's the implementation I would like to see, to be honest. Start
with the observation that we can treat Mac OS X NFD paths as a client
character encoding. Now observe that it is lossy. But ... almost all
non-Unicode client charsets are equally lossy, for exactly the same
reason!

This suggests maintaining a mapping table in wc.db between server paths
(UTF-8, unspecified NF) and wc paths (local charset, which is
occasionally UTF-8 with NFD).

This mapping table would be maintained any time we write to the wc.
It would be consulted any time we search for files in the wc.

It's not really extra work - we have to do those UTF-8 <-> local
charset conversions all the time anyway. This would in fact cache
those conversions.

The implementation on OS X might be a bit hairy, if there isn't an easy
way to retrieve the real pathname of the file you just created.
Anywhere else, we just store the pathname we just calcuated.

Peter
Received on 2012-01-31 00:15:03 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.