[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Umlaut problem on Mac (composed vs. decomposed UTF-8)

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2007-07-15 14:19:27 CEST

On Jul 13, 2007, at 11:20, Thomas Singer wrote:

> First there needs to be consensus *how* to fix it.

This issue *really* annoys me, so I dug around the code some a while
back despite lacking the C & APR skillz to actually fix it.

http://svn.haxx.se/dev/archive-2007-03/0060.shtml

It looks like SVN just blindly *assumes* that it's getting UTF-8
(composed) when the underlying file system claims to be UTF-8:

   svn_error_t *
   svn_path_cstring_to_utf8(const char **path_utf8,
                            const char *path_apr,
                            apr_pool_t *pool)
   {
     svn_boolean_t path_is_utf8;
     SVN_ERR(get_path_encoding(&path_is_utf8, pool));
     if (path_is_utf8)
       {
         *path_utf8 = apr_pstrdup(pool, path_apr);
         return SVN_NO_ERROR;
       }
     else
       return svn_utf_cstring_to_utf8(path_utf8, path_apr, pool);
   }

Linux systems, as I understand them, just consider file names to be a
sequence of bytes. They don't normalize the encoding either way. I
think SVN only works there because the programs/libraries creating
files on UTF-8 linux systems all just 'happen' to use UTF-8 (composed).

The MacOS does standardize this. A file name is not just a 'bunch of
bytes', it's always UTF-8 (decomposed).

So, the proper thing to do is probably translate from UTF-8
(decomposed) to UTF-8 (composed) at the interface between SVN and the
underlying file system when running on a Mac, no?

What would be wrong with solving the problem like this?:

   svn_error_t *
   svn_path_cstring_to_utf8(const char **path_utf8,
                            const char *path_apr,
                            apr_pool_t *pool)
   {
     svn_boolean_t path_is_utf8;
     SVN_ERR(get_path_encoding(&path_is_utf8, pool));
     if (path_is_utf8)
       {
         *path_utf8 = apr_pstrdup(pool, path_apr);
         if (PLATFORM_USES_DECOMPOSED_UTF8)
           {
             normalize_utf8_composed(path_utf8);
           }
         return SVN_NO_ERROR;
       }
     else
       return svn_utf_cstring_to_utf8(path_utf8, path_apr, pool);
   }

   void
   normalize_utf8_composed(const char **path_utf8)
   {
     /* ... and then a miracle occurs ... */
   }

Have I misunderstood the problem?

// bpsm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jul 15 14:19:24 2007

This is an archived mail posted to the Subversion Dev mailing list.