[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Standardizing on UTF8 internally isn't enough

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2007-07-19 15:40:50 CEST

On 7/19/07, Mark Phippard <markphip@gmail.com> wrote:
> On 7/19/07, Erik Huelsmann <ehuels@gmail.com> wrote:

...[snip]...

> I am not against a better solution, but I still think a "good enough"
> solution would be a big improvement.
>
> > PS: The ICU library can be can be reduced in size to 1/10th if you
> > only want normalization.
> > PPS: Did people really revolt against apr-iconv because of its size?
>
> No, it was not the size. Although that was a bonus. The problem is
> that the library is poorly designed in terms of supporting multiple
> applications using multiple versions or even built with different
> compilers.
>
> I'd like to see a real number on size. The ICU web site doesn't seem
> to give sizes as big as what has been quoted here.

Perhaps we should consider an alternative to ICU if it looks like
it'll be a problematic dependency. I poked around and found that
apparently Python [1.*] and Perl [2] are doing their own thing on this
front:

[1.1] http://svn.python.org/view/python/trunk/Modules/unicodedata.c
[1.2] http://svn.python.org/view/python/trunk/Modules/unicodedata_db.h
[1.3] http://svn.python.org/view/python/trunk/Modules/unicodename_db.h
[2] http://search.cpan.org/~sadahiro/Unicode-Normalize-1.02/Normalize.pm

They both provide NCF and NCD normalization. Perhaps svn could use or
adapt one of these implemenetations? (License combatibility?)

// bsmith@occs

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 19 15:40:00 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.