[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Standardizing on UTF8 internally isn't enough

From: Mark Phippard <markphip_at_gmail.com>
Date: 2007-07-17 22:28:05 CEST

On 7/17/07, Erik Huelsmann <ehuels@gmail.com> wrote:
> Because we defined we'd be using UTF-8, but didn't define we'd be
> using NFC or NFD, we now have a problem: we defined where we wanted
> the door in our house, but not what it's size would be :-)

So if I create a new file an umlaut on a Mac can I add and commit it
to Subversion? Does it store it as NFD in the repository? Or does it
all fail before this?

I am trying to define if we have adopted NFC as a convention (even if
unintentional) or whether our repositories potentially have a mixture
in them.

> We are not the only project with this problem however (and note that
> it's not the Mac which causes this problem, using Unicode is): IBM
> created the ICU project (http://icu-project.org/) to address all kinds
> of i18n problems including Unicode normalization, collation etc.
> This problem could be solved to by adding the ICU lib as a dependency
> and change all path comparisons to use the ICU normalform agnostic
> comparison routines.
> I hope this explains the problem (and problem domain) to anybody who
> never delve into Unicode before.

This is all good information. Do you have a proposal to make?

It seems like unless we choose to assume repositories currently
contain NFC and we standardize on that, then we have to dump/load if
we want to enforce a standardized format?

If we have to dump/load then it seems like NFD would be the best
choice as apparently that is the version that is actively receiving
new characters?

Mark Phippard
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jul 17 22:27:17 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.