[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 NFC/NFD paths issue

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Sun, 19 Sep 2010 09:54:32 +0100

Greg Stein wrote on Sat, Sep 18, 2010 at 15:55:57 -0400:
> On Sat, Sep 18, 2010 at 04:42, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> > Greg Stein wrote on Fri, Sep 17, 2010 at 07:22:12 -0400:
> >> On Thu, Sep 16, 2010 at 19:26, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> >> > Greg Stein wrote on Thu, Sep 16, 2010 at 00:59:59 -0400:
> >> >> On Wed, Sep 15, 2010 at 23:35, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> >> >...
> >> >> > If yes, then we infer that no two in-repository paths (which are
> >> >> > bytewise different) canonicalize to the same byte sequence.  Which is
> >> >> > pretty useful precondition to have, i.e., what /can/ svn do on a legacy
> >> >> > repository where some two paths are bytewise-different yet Unicode-equal?
> >> >>
> >> >> This will be *very* difficult to manage. Even if a given repository
> >> >> somehow manages to rewrite history to "fix" the paths, then you may
> >> >> have an unknown number of downstream synchronized repositories to
> >> >> similarly fix.
> >> >>
> >> >> I think an answer might be to rely on the upcoming obliterate
> >> >> feature's "out of band" change descriptions. For example, a repository
> >> >> might tell a working copy, "hey: file XYZ was obliterated since you
> >> >> last talked to me. if you happen to have it, then get rid of it. I
> >> >> won't recognize it henceforth." You can see a similar descriptor sent
> >> >> to working copies or repositories that says "I recoded XYZ. update to
> >> >> the new encoding."
> >> >>
> >> >
> >> > I don't see why this needs to be special-cased?  The server can simply
> >> > send "rename(NFD(é), NFC(é))" and the wc library can figure for itself
> >> > that it's inoperative for her in the same place she determines that
> >> > "rename('foo','FOO')" is inoperative for her (when the filesystem is
> >> > case-insensitive).
> >>
> >> When does the server send that? If the wc is at r1000, and the server
> >> is at r1000, then the standard update response is nil.
> >>
> >> Yet if an administrator comes along and renames the repository paths
> >> to NFC, then *something* needs to return in an update response. I see
> >> it as "not part of the update request", and that there is an
> >> out-of-band response that details such changes. ie. changes that occur
> >> outside the revision numbering flow.
> >>
> >
> > i.e., out-of-band is one choice, and making it a proper revision is the
> > other choice.
> >
> > ?
>
> That doesn't solve the historical revisions containing "bad" paths. My
> understanding of the problem was that we'd go into the past and
> rewrite the paths into a single, canonical form.
>

Agreed: an out-of-band solution fixes thing historically too.

Having backend enforce NFC can wait for 2.0 I suppose :)

> Cheers,
> -g
Received on 2010-09-19 10:56:53 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.