[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

converting unconvertible UTF-8 data

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-07-21 08:05:47 CEST

kfogel@tigris.org writes:
> Log:
> In revisions 2600 and 2598, the Subversion repository has UTF-8 data
> that cannot be converted to ISO-8859-1, among others. (The data is
> Branko Čibej's full name, which I include here just to make this
> revision self-proving).
>
> * subversion/clients/cmdline/log-cmd.c
> (log_message_receiver): Don't error if encounter unconvertible data,
> just print a placeholder and move on.

Whew!

Okay, now that the immediate problem is fixed, we need to decide how
to deal with this better.

   Problem: A log message may have data with characters that cannot be
            converted from UTF-8 to the local encoding.

The current solution makes "svn log" work again, but loses more
information than it has to. Some revisions may print out with a log
message that says simply:

   "[unconvertible log msg]"

This could be... frustrating... for users :-).

I can think of three improvements, not mutually exclusive:

   1) A --raw option (or whatever it would be called) that tells log
      to print the raw bytes of the data, instead of trying to
      convert. I don't know what other commands this flag might
      affect; only log comes to mind right now. It's your own fault
      if it screws up your tty :-).

   2) A --allow-raw option, meaning, convert if can, else emit the raw
      data if conversion fails.

   3) Have a fuzzy conversion function that tries to convert all the
      data, but if that fails, converts every character it can and
      replaces the others with ?\XXX (or some standard sequence) to
      indicate the Unicode value of the failed character.

   4) My brain is puny and weak. There are surely other ways to
      address this problem that I'm not thinking of. Suggestions?

Right now I like (3) the best, since it doesn't force the user to do
something different. Of course, we'd have to choose wisely where we
use the fuzzy function -- again, only "log" comes to mind so far.

Thoughts?,
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jul 21 08:18:25 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.