[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Non-normalizing Unicode Composition Awareness

From: Branko ÄŒibej <brane_at_wandisco.com>
Date: Fri, 09 Nov 2012 13:49:59 +0100

On 09.11.2012 12:28, Thomas Ã…kesson wrote:
> Today, I noticed that Branko started some implementation in a branch. Looks like a collation based on utf8proc is in the making? I think that would make a lot of sense because the ICU extension poses some challenges in the build process and we might not need all that functionality that it provides.

Hi Thomas,

Yes, I started a branch that's intended to fix the normalization
problem. I selected utf8proc because we really don't need ICU (I can't
see a serious need for language-specific case folding, for example, nor
for Unicode regular expressions). Furthermore, utf8proc can be easily
embedded into Subversion so it doesn't present another dependency that
users would have to worry about.

I'm currently doing the grunt work of implementing the collation (done)
and the LIKE and GLOB operators that we'll need (in progress). The next,
and biggest, step will be to review the client and WC libraries to make
sure that paths sent to the server always come from the wc.db, not from
disk.

One open question is what to do about (historical) collisions in
existing repositories, but I don't think that issue is important enough
to resolve now.

It'll take a while, but I hope to be able to finish the work in time for
1.8. If not ... well then, it'll be in 1.9.

-- Brane

-- 
Branko ÄŒibej
Director of Subversion | WANdisco | www.wandisco.com
Received on 2012-11-09 13:50:41 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.