[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-05-22 23:44:38 CEST

Ok, here's another progress report on inserting the missing UTF-8 code
into Subversion.

I have now completed the patches for clients/cmdline and
libsvn_client. I have manually inspected every single file in each
and to the best of my knowledge, this is _all_ that is required for
these two components. The other libs probably still need some more
patches, but I expect to have gone through them as well by mid next
week. My guess is that only a rather small number of further patches
will be required.

I was thinking a bit about whether this should become a branch, but it
would probably get too messy to merge pretty quickly, so I'd advise
against it. Better to put it in the trunk directly, with
--disable-utf8 as the default for the time being of course.

The coding guidelines I've been using are as follows:

Inside libsvn_*:

   All strings are assumed to be UTF-8 encoded

   Therefore, all system calls (direct, via APR, or via libc)
    involving strings have to use converted versions of the strings.
    The conversion is placed as close to the system call as possible
    unless there is a compelling argument to do otherwise (arguments
    to non-static svn_*-functions must always be UTF-8 though).

Inside clients/cmdline:

   Strings are assumed to be encoded with the native character
    encoding unless explicitly marked as being UTF-8 coded with a
    comment.

   In addition, the following functions return UTF-8 encoded strings:

     - svn_cl__parse_num_args
     - svn_cl__parse_all_args
     - svn_cl__args_to_target_array
     - svn_cl__stringlist_to_array
     - svn_cl__newlinelist_to_array
     - svn_cl__edit_externally (which also expects UTF-8-encoded input)

   Strings that are not part of the above exceptions are recoded
    before being passed to any svn_* function (except svn_cl__*).
    Strings passed to callbacks from libsvn_* are recoded before use.
    Strings that are UTF-8 encoded as part of the above exceptions but
    need to be printed to stdout or similar are decoded back again.

  (The reason for the various exceptions is that they make the
   foo-cmd.c files much, much cleaner.)

The current diff (against rev 2000) is included as an attachment. The
diff for all libs is not yet complete, but those for clients/cmdline
and libsvn_client are, and may be unmercifully reviewed. :-)

  // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Wed May 22 23:49:51 2002

This is an archived mail posted to the Subversion Dev mailing list.