Garret Wilson <garret_at_globalmentor.com> writes:
> On 1/23/2012 10:38 AM, Philip Martin wrote:
>> Garret Wilson<garret_at_globalmentor.com> writes:
>>
>>> On 1/23/2012 9:55 AM, Philip Martin wrote:
>>>> I thought you were proposing to write the code?
>>> I'm fine with that as well. Looks like I would have to add a few lines
>>> to decote UTF-8 (surely such code already exists in the Subversion
>>> codebase somewhere) and change a few if(...){} statements. I should be
>>> able to handle that. I would imagine it will take more effort on my
>>> part to get permission to change the code than actually writing the
>>> code itself.
>> The function receives a string of bytes, I think it's already in UTF-8.
>> The problem is that while Subversion has functions to validate UTF-8 it
>> doesn't have a system for extracting individual UTF-8 code points. At
>> present it only ever needs to extract the ASCII subset which is trivial.
>
> Ah. Well, like I said---I would be happy to write the UTF-8 extraction
> code. It would be worth it to me to get this functionality in; it
> would be a fun exercise for me; it would be a good introduction to the
> codebase for me; it's a small (very small), low-risk task; and the
> Subversion codebase would be better off in the end. (I'm sure it can
> be used elsewhere.) It's a win-win for everyone! :D
>
> This is really a small thing. Here's an example in just a few lines:
> http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
>
> Or see DecodeUTF8BytesToChar at
> tidy.sourceforge.net/cgi-bin/lxr/source/src/utf8.c .
Subversion already has UTF-8 code:
http://svn.apache.org/repos/asf/subversion/trunk/subversion/include/private/svn_utf_private.h
http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_subr/utf_validate.c
but it needs an API to extract code-points.
The situation is that the low level svn_fs.h API allows property names
to be any null-terminated C string. The intermediate svn_ra.h API
imposes restrictions because only XML names can be marshalled over
http:, I think svn: allows anything. The high level svn_client.h API
restricts names to a subset of ASCII and thus avoids passing anything
the RA layers cannot handle.
You want to relax the svn_client.h API to allow XML names. Strictly
speaking I suppose a 3rd party RA implementation might only support the
svn_client.h subset, but I don't know of any other RA implementations.
--
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com
Received on 2012-01-23 20:51:18 CET