[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Are log messages Unicode?

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Sun, 13 Jul 2008 23:37:53 +0200

Hi list, long time no see :)

Daniel Shahaf wrote:
> Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400:
>> "Ben Collins-Sussman" <sussman_at_red-bean.com> writes:
>>> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry_at_barrys-emacs.org> wrote:
>>>> Using the svn_client API is it possible for a client to write
>>>> none-UTF-8 log messages?
>>>> Clearly if this happened it would be a bug in the client given the
>>>> above statement.
>>> I don't recall the details, but it's actually the *programmers'*
>>> burden to convert paths and log messages from native locale to UTF8
>>> (and back again). If you read the svn APIs, you'll notice that every
>>> path and log message passed into APIs (or passed around between APIs)
>>> are presumed to *already* be UTF8. So if you're writing your own
>>> client, it's your job to convert user input to UTF8 before passing to
>>> svn_client_*(). Look at the commandline client to see how it's doing
>>> that; I believe there a number of convenience routines in libsvn_subr
>>> to help with conversion.
>> I think Barry's asking if the client and/or server do any validation.
>> That is, if the programmer supplies a non-UTF8 log message, our client
>> libraries should reject it; and if such a log message were to reach the
>> repository (perhaps because someone wrote their own client software from
>> scratch), the repository should reject it too.
>>
>> I don't know whether we do such validation or not, but agree we should.
>>
>
> Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log
> messages in libsvn_repos. It has not been backported to 1.5.x.

Quoting message "[PATCH] issue 1796: ..." from 03 Jun 2008 by me:

"
The subversion server and client do not validate props in places where
they should:
- where the server receives props from a client out there. (#1796)
- where the server reads props from the repository file system.
- where the svn client reads props from a server out there.
(Approval by kfogel)

[My] patch starts by fixing the specific problems of issue 1796, only:
- where the server receives props from a client out there. (#1796)
, and limited only to the log message prop (SVN_PROP_REVISION_LOG).
"

I am still intending to continue on these issues... (I have been
diverted because of the social shock following a recent unexpected death
in my close family)

I am still at the point where I am trying to find out

- the best place to validate props being read from the repository file
system by the server;

- how to write a unit test on whether the server validates props read
from the file system (the code that writes *to* the file system now
validates props; so, how do I get *unvalidated* props written to the
file system in the first place?);

- the best place to validate props in the client, reading from a server
out there;

- how to write a unit test on whether the client validates props read
from a server out there;

- which other props need to be validated;

- what the formats for these other props are (are they, by chance, all
UTF8 & LF? That would be nice.).

Since other/more people are taking interest in these issues, maybe it
would make sense to file separate issues in the issue tracker for the
remaining two cases? :

- where the server reads props from the repository file system.
- where the svn client reads props from a server out there.

>
> The cmdline client also does some conversions; in my case, it
> dropped the bytes it couldn't understand:
>
> % svn ci iota -F dump-fragment.txt
> Sending iota
> Transmitting file data .
> Committed revision 2.
>
> # It should have failed. Let's see...
> % xxd ../../repos1/db/revprops/0/2
> ...
> 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101.
> 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv
> 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe
> ...
>
> # Ah, but that's not the log message I specified!
> % xxd dump-fragment.txt
> 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V
> 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r
> 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul
> # It dropped these bytes: ^ ^ ^
>
>> Barry, got time to test/trace it?

Hm, that's not nice. Silently dropped bytes aren't good. The user should
at least be informed about what's happening...

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194

Received on 2008-07-13 23:38:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.