[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 NFC/NFD paths issue

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Thu, 16 Sep 2010 05:35:40 +0200

Erik Huelsmann wrote on Wed, Sep 15, 2010 at 23:20:06 +0200:
> Yesterday, I was talking to CMike about our long-standing issue with UTF-8
> strings designating a certain path not neccessarily being equal to other
> strings designating the same path. The issue has to do with NFC (composed)
> and NFD (decomposed) representation of Unicode characters. CMike nicely
> called the issue the "Erik Huelsmann issue" yesterday :-)
>
> The issue consists of two parts:
> 1. The repository which should determine that paths being added by a commit
> are unique, regardless of their encoding (NFC/NFD)

Will you assume that all paths in the repository have been
Unicode-canonicalized prior to entering the repository?

If yes, then we infer that no two in-repository paths (which are
bytewise different) canonicalize to the same byte sequence. Which is
pretty useful precondition to have, i.e., what /can/ svn do on a legacy
repository where some two paths are bytewise-different yet Unicode-equal?

> 2. The client which should detect that the pathnames coming in from the
> filesystem may differ in encoding from what's in the working copy
> administrative files [this is mainly an issue on the Mac:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2464]
>
...
> Basically what I was trying to do is: do what we do now (ie fail if the path
> exists and succeed if it doesn't), with the only difference that the paths
> used for comparison are guarenteed to be the same normalization - meaning
> they are the same byte sequence when they're equal unicode.
Received on 2010-09-16 05:39:51 CEST

This is an archived mail posted to the Subversion Dev mailing list.