[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Are log messages Unicode?

From: Barry Scott <barry_at_barrys-emacs.org>
Date: Sun, 13 Jul 2008 00:01:00 +0100

On Jul 12, 2008, at 15:54, Barry Scott wrote:

>
> On Jul 7, 2008, at 17:15, Karl Fogel wrote:
>
>> "Ben Collins-Sussman" <sussman_at_red-bean.com> writes:
>>> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry_at_barrys-
>>> emacs.org> wrote:
>>>> Using the svn_client API is it possible for a client to write
>>>> none-UTF-8 log messages?
>>>> Clearly if this happened it would be a bug in the client given the
>>>> above statement.
>>>
>>> I don't recall the details, but it's actually the *programmers'*
>>> burden to convert paths and log messages from native locale to UTF8
>>> (and back again). If you read the svn APIs, you'll notice that
>>> every
>>> path and log message passed into APIs (or passed around between
>>> APIs)
>>> are presumed to *already* be UTF8. So if you're writing your own
>>> client, it's your job to convert user input to UTF8 before
>>> passing to
>>> svn_client_*(). Look at the commandline client to see how it's
>>> doing
>>> that; I believe there a number of convenience routines in
>>> libsvn_subr
>>> to help with conversion.
>>
>> I think Barry's asking if the client and/or server do any validation.
>> That is, if the programmer supplies a non-UTF8 log message, our
>> client
>> libraries should reject it; and if such a log message were to
>> reach the
>> repository (perhaps because someone wrote their own client
>> software from
>> scratch), the repository should reject it too.
>>
>> I don't know whether we do such validation or not, but agree we
>> should.
>>
>> Barry, got time to test/trace it?
>>
>
> I have the dump of the repos that causes pysvn to fail. In the
> attachment is
> the fragment of the dump file for r219 that causes the problems. If
> you need the
> whole 3MB of the full dump I'll have to ask permission to pass it
> on to you.
>
> Python cannot decode the svn:log as utf-8.
>
> $ python2.5 extract_log_text.py
> 'Bitbucket r\xe9serv\xe9 \xe0 dev/null\nClassement dans Mail/spam
> seulement apr\xe8s le localstart qui lance spamc\n'
> '\xe9s'
> Traceback (most recent call last):
> File "extract_log_text.py", line 12, in <module>
> print log.decode( 'utf-8' )
> File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/encodings/utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position
> 11-13: invalid data
>
> Is this proof that the repos has none UTF-8 log text?
>
> svn 1.4.6 is happy to show the log:
>
> $ svn log -r219 file:///Users/barry/tmp/repos/trunk/dotfiles
> ----------------------------------------------------------------------
> --
> r219 | bortzmeyer | 2003-01-17 14:04:31 +0000 (Fri, 17 Jan 2003) |
> 3 lines
>
> Bitbucket r?\233serv?\233 ?\224 dev/null
> Classement dans Mail/spam seulement apr?\232s le localstart qui
> lance spamc
>
> ----------------------------------------------------------------------
> --
>
> But the \233 are supposed to be é I understand.
>
> Barry
>
>

Opss forgot the attachement.

Barry

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-07-13 01:01:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.