[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Fri, 3 Feb 2012 06:46:07 +0900

2012/2/3 Peter Samuelson <peter_at_p12n.org>:
>
> [Hiroaki Nakamura]
>> Existing repositories, I think it would be better to convert them too using
>> svndump/svnload. And we change svnload to convert filenames to NFC.
>> However in reality we cannot force users to convert every existing repository.
>
> Also note that if you convert a repository (via dump/load or whatever),
> all working copies based on the repository are invalidated and need to
> be re-checked-out. Avoiding _that_ problem would be really hairy, I
> think, very similar to the sort of work that would be needed to support
> obliterate without losing working copies.
>
>> We also need to changes servers in order to deal with existing 1.x
>> clients. We convert filenames to NFC when web_dav_svn and svnserve
>> receive filenames from clients, they must first convert filenames to
>> NFC.
>
> You keep saying what we "must" do on the server side. I propose
> something that is purely on the client side. It will solve the OS X /
> non-OS X interoperability problem. It will not solve every problem
> ever faced by a Subversion user. That's a job for 2.0.

OK. When I started this thread, I suppose we'd like to focus to
long term solution 2.x. That's because the short term solution options (4)
written in
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
seems too diificult and complex for me.

But if a modification to my proposal will fit in short term 1.x,
I will modify it delightedly.

>
>> Yes, like I said above, "clients" actually includes components that
>> run on servers like web_dav_svn, and it should read as any components
>> that access to repositories and working copies.
>
> No. By "clients" I mean components that run on the client side. If my
> proposal had required changes to mod_dav_svn, I would not have said
> "strictly client-side". I do not propose any change to mod_dav_svn,
> svnserve, svnadmin, libsvn_repos, libsvn_fs, the repository data, or
> anything else on the server side.
>
>> If you think in analogy to ASCII uppercase and lowercase examples,
>> you miss the point. Please reread the Unicode Standard Annex #15
>> UAX #15: Unicode Normalization Forms
>> http://unicode.org/reports/tr15/
>
> Thanks, I've read it. The analogy stands. We could prevent NFC/NFD
> collisions as an additional service to users, something we have not
> done for the past 10 years. This would be along the lines of
> preventing users from shooting themselves in the foot.
>
> The actual _software_ problem that is solved by preventing collisions
> is the same as the software problem solved by preventing upper/lower
> case collisions: certain clients are unable to check out a folder that
> has such collisions. (Windows clients, in the case of upper/lower
> collisions; OS X clients, in the case of NFC/NFD collisions.)

Yes, I agree with that.

>
> I think we are talking past each other. You are trying to solve two
> distinct but related problems: 1. OS X client-side confusion when faced
> with a non-NFD repository path; 2. NFC/NFD collisions. I am only
> trying to solve problem 1. I'm ignoring problem 2 for two reasons:
>
> (a) Problem 2 requires server-side work and complex compatibility /
> upgrade scenarios (dump/load, re-check-out all wcs, etc).
>
> (b) Problem 2 can be worked around, for new repositories (or
> repositories with no existing collisions), with a pre-commit hook.
>
> ...neither of which are true for my proposal to solve problem 1.
>
> So long as you continue to insist that, to solve problem 1, we must
> also solve problem 2, I'm pretty sure we will never come to any
> agreement.

OK. So how about changing my proposal like:
(1) No sever modification. Just modify svn_path_cstring_to_utf8 only.
(2) Let users install a pre-commit hook which rejects any non-NFC filenames.

In this way, we only need one function. Modification is just like
the original OS X unicode path patch:
utf8precompose_macosx_2.patch
http://subversion.tigris.org/nonav/issues/showattachment.cgi/813/utf8precompose_macosx_2.patch
in
http://subversion.tigris.org/issues/show_bug.cgi?id=2464

Only difference the original patch to my patch will be mine use
utf8proc so that we can use it on all platforms, Mac OS X, Windows
and Linux.

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-02 22:46:38 CET

This is an archived mail posted to the Subversion Dev mailing list.