[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 NFC/NFD paths issue

From: Branko Čibej <brane_at_xbc.nu>
Date: Mon, 20 Sep 2010 14:49:53 +0200

 On 17.09.2010 13:22, Greg Stein wrote:
> On Thu, Sep 16, 2010 at 19:26, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> Greg Stein wrote on Thu, Sep 16, 2010 at 00:59:59 -0400:
>>> On Wed, Sep 15, 2010 at 23:35, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
>> ...
>>>> If yes, then we infer that no two in-repository paths (which are
>>>> bytewise different) canonicalize to the same byte sequence. Which is
>>>> pretty useful precondition to have, i.e., what /can/ svn do on a legacy
>>>> repository where some two paths are bytewise-different yet Unicode-equal?
>> (I assume you're replying to my second paragraph)
>>> This will be *very* difficult to manage. Even if a given repository
>>> somehow manages to rewrite history to "fix" the paths, then you may
>>> have an unknown number of downstream synchronized repositories to
>>> similarly fix.
>>>
>>> I think an answer might be to rely on the upcoming obliterate
>>> feature's "out of band" change descriptions. For example, a repository
>>> might tell a working copy, "hey: file XYZ was obliterated since you
>>> last talked to me. if you happen to have it, then get rid of it. I
>>> won't recognize it henceforth." You can see a similar descriptor sent
>>> to working copies or repositories that says "I recoded XYZ. update to
>>> the new encoding."
>>>
>> I don't see why this needs to be special-cased? The server can simply
>> send "rename(NFD(é), NFC(é))" and the wc library can figure for itself
>> that it's inoperative for her in the same place she determines that
>> "rename('foo','FOO')" is inoperative for her (when the filesystem is
>> case-insensitive).
> When does the server send that? If the wc is at r1000, and the server
> is at r1000, then the standard update response is nil.

And of course there is no such place in the client code ... AFAIK we
still have the case-changing-rename bug. Even if that were not the case,
some filesystems will do their own canonicalization; I believe it's NFD
on Mac OS and NFC on Windows, at least with NTFS.

Which in turn means that what's written to wc-db might not match what's
on disk.

-- Brane
Received on 2010-09-20 14:50:52 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.