[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Stefan Sperling <stsp_at_elego.de>
Date: Mon, 30 Jan 2012 20:10:05 +0100

On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote:
> 2012/1/30 Stefan Sperling <stsp_at_elego.de>:
> > My friend is not willing to upgrade to a new client version yet, which
> > is fine because all 1.x releases of Subversion clients are supposed
> > to be compatible with all 1.y releases of Subversion servers. He should
> > not have to upgrade his client just because the server was upgraded.
> >
> > In his working copy, the file name is also in NFD form. When he
> > talks to the server, the server provides the name in NFC. Because he
> > is using the old client the client has no way of knowing how to map
> > the NFC name to its local NFD file. So we've broken backwards
> > compatibility for my friend.
> I think we cannot avoid this. So this patch is for 2.x, which may
> break backward compatibility.

If we are ever going to break compatibility, this issue will
certainly be addressed by normalising all paths as you suggest.
It was an unfortunate oversight that no NFD/NFC normalisation
was implemented in the first place :(

However, we really do not want to break compatibility at this time.
A solution that does not require us to break compatibility would
be much better. Nobody knows yet when the time for 2.x will come.

As far as I know, HFS+ is the only filesystem that has this problem.
It is possible to use other filesystems on Mac OS X as a workaround.
For example UFS, ext2, or NTFS (via FUSE).

I think Subversion's backwards compatibility is very important and
should not be jeopardised because of the behaviour of one filesystem.
> If we have two files of the same filenames, one in NFC, the other in NFD,
> it is really a headache for us to normalize all paths to NFC. The only thing
> we can do is just keep one file of the two and throw the other file.
> In reality, I think this is rare case. If we find this collision when upgrading
> repositories, we should stop and provide the way for users to choose which
> one to save.

I agree that this is probably a rare case in practice. However, we must
be prepared to handle it. Users who run into this problem can lose the
ability to use newer versions of Subversion to read their data.
This cannot be allowed to happen because we want to stay compatible.

> > As you can see, there is a lot of complexity involved in fixing this
> > issue. I hope you aren't discouraged by this. Someone will need to
> > explore the details of these problems to fix this issue. I am not convinced
> > that it is impossible to fix. We'll need to be very careful about backwards
> > compatibility when making decisions. But there might be ways to achieve a
> > satisfying solution nonetheless.
> Like other people say, we should prohibit the NFC/NFD same filename collision,
> not in the subversion system, but in operational rules, just don't do that.

So far, "don't do that" has been the answer to this entire problem.
We've been telling people if they want to use non-ASCII characters
with both Windows/Linux and Mac OS X clients they should not be using HFS+.

And mixing various unicode forms works fine today if the filesystem
used by the client supports this. The use case Neels contrived, where
developers want to test their code with unicode filenames in various
NFD/NFC forms, and check those test files into Subversion, should still
be supported.

> Then, the rest problem seems rather simple. Convert *all* input paths to NFC
> first, then do the work. All input means paths passed to servers from clients,
> paths obtained by servers from repositories, paths obtained by clients from
> working copies. Is that correct?

Yes, that is correct. Also, paths obtained by clients from the local
filesystem, and paths sent by servers to clients.
Received on 2012-01-30 20:10:42 CET

This is an archived mail posted to the Subversion Dev mailing list.