[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Daniel Shahaf <danielsh_at_elego.de>
Date: Thu, 2 Feb 2012 22:55:38 +0200

Hiroaki Nakamura wrote on Fri, Feb 03, 2012 at 05:33:02 +0900:
> 2012/2/3 Daniel Shahaf <danielsh_at_elego.de>:
> > Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100:
> >> On 02.02.2012 20:22, Peter Samuelson wrote:
> >> > [Hiroaki Nakamura]
> >> >> In option (2), we do n12n on all clients on all platforms, and we
> >> >> include web_dav_svn in "clients". So we convert all input paths to
> >> >> the "server encoding", which is NFC.
> >> > Indeed.  But the very concept of a "server encoding" means we are
> >> > involving the server side.  Which invokes a lot of difficult questions
> >> > like "what about existing 1.x clients", "what about existing checkouts"
> >> > and "what about existing repositories".
> >> >
> >> > By proposing a client-only solution, I hope to avoid _all_ those
> >> > questions.
> >>
> >> Can't see how that works, unless you either make the client-side
> >> solution optional, create a mapping table, or make name lookup on the
> >> server agnostic to character representation. I can't envision how any of
> >> those solutions would work all the time.
> >>
> >> It would be nice if we could normalize paths in the repository without
> >> having to perform a dump/reload cycle, but I don't know how that would
> >> work in FSFS
> >
> > It won't.  Changing the encoding increase the length (in bytes) of the
> > string (in the dirents hash, for example), and thus change the offsets
> > of the node-revs that are later in the file --- to which subsequent
> > revisions, and the id's of those node-revs, refer.
>
> Changes from NFD to NFC does not increase the length.
> The length will be same or smaller, not larger.
>

If the conversion is guaranteed to be monotone non-increasing (in
length) then I believe could be made to work "in place".

As to keeping concurrent readers and preexisting working copies sane ---
for now I'm LAAEFTR'ing that.

> Here I quote from
> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
> > The proposed internal 'normal form' should be NFC, if only if
> > it were because it's the most compact form of the two: when
> > allocating memory to store a conversion result, it won't be
> > necessary (ever) to allocate more than the size of the input buffer.
>
>
> --
> )Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-02 21:56:19 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.