[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Non-normalizing Unicode Composition Awareness

From: Branko ─îibej <brane_at_wandisco.com>
Date: Fri, 09 Nov 2012 13:49:59 +0100

On 09.11.2012 12:28, Thomas ├ůkesson wrote:
> Today, I noticed that Branko started some implementation in a branch. Looks like a collation based on utf8proc is in the making? I think that would make a lot of sense because the ICU extension poses some challenges in the build process and we might not need all that functionality that it provides.

Hi Thomas,

Yes, I started a branch that's intended to fix the normalization
problem. I selected utf8proc because we really don't need ICU (I can't
see a serious need for language-specific case folding, for example, nor
for Unicode regular expressions). Furthermore, utf8proc can be easily
embedded into Subversion so it doesn't present another dependency that
users would have to worry about.

I'm currently doing the grunt work of implementing the collation (done)
and the LIKE and GLOB operators that we'll need (in progress). The next,
and biggest, step will be to review the client and WC libraries to make
sure that paths sent to the server always come from the wc.db, not from

One open question is what to do about (historical) collisions in
existing repositories, but I don't think that issue is important enough
to resolve now.

It'll take a while, but I hope to be able to finish the work in time for
1.8. If not ... well then, it'll be in 1.9.

-- Brane

Branko ─îibej
Director of Subversion | WANdisco | www.wandisco.com
Received on 2012-11-09 13:50:41 CET

This is an archived mail posted to the Subversion Dev mailing list.