[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Are log messages Unicode?

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Tue, 05 Aug 2008 01:51:17 +0200

Daniel Shahaf wrote:
> (patch manager hat on)
>
> Neels Janosch Hofmeyr wrote on Sun, 13 Jul 2008 at 23:37 +0200:
>> Daniel Shahaf wrote:
>>> Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400:
>>>> I think Barry's asking if the client and/or server do any validation.
>>>> That is, if the programmer supplies a non-UTF8 log message, our client
>>>> libraries should reject it; and if such a log message were to reach the
>>>> repository (perhaps because someone wrote their own client software from
>>>> scratch), the repository should reject it too.
>>>>
>>>> I don't know whether we do such validation or not, but agree we should.
>>>>
>>> Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log
>>> messages in libsvn_repos. It has not been backported to 1.5.x.
>> Quoting message "[PATCH] issue 1796: ..." from 03 Jun 2008 by me:
>>
>> "
>> The subversion server and client do not validate props in places where
>> they should:
>> - where the server receives props from a client out there. (#1796)
>> - where the server reads props from the repository file system.
>> - where the svn client reads props from a server out there.
>> (Approval by kfogel)
>>
>> [My] patch starts by fixing the specific problems of issue 1796, only:
>> - where the server receives props from a client out there. (#1796)
>> , and limited only to the log message prop (SVN_PROP_REVISION_LOG).
>> "
>>
>> I am still intending to continue on these issues... (I have been
>> diverted because of the social shock following a recent unexpected death
>> in my close family)
>>
>> I am still at the point where I am trying to find out
>>
>
> Comments, anyone?
>
> Neels, I think you can answer some of these questions yourself :)
>
>> - the best place to validate props being read from the repository file
>> system by the server;
>>
>> - how to write a unit test on whether the server validates props read
>> from the file system (the code that writes *to* the file system now
>> validates props; so, how do I get *unvalidated* props written to the
>> file system in the first place?);
>>
>> - the best place to validate props in the client, reading from a server
>> out there;
>>
>> - how to write a unit test on whether the client validates props read
>> from a server out there;
>>
>> - which other props need to be validated;
>>
>> - what the formats for these other props are (are they, by chance, all
>> UTF8 & LF? That would be nice.).
>>
>> Since other/more people are taking interest in these issues, maybe it
>> would make sense to file separate issues in the issue tracker for the
>> remaining two cases? :
>>
>> - where the server reads props from the repository file system.
>> - where the svn client reads props from a server out there.
>>
>>> The cmdline client also does some conversions; in my case, it
>>> dropped the bytes it couldn't understand:
>>>
>>> % svn ci iota -F dump-fragment.txt
>>> Sending iota
>>> Transmitting file data .
>>> Committed revision 2.
>>>
>>> # It should have failed. Let's see...
>>> % xxd ../../repos1/db/revprops/0/2
>>> ...
>>> 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101.
>>> 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv
>>> 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe
>>> ...
>>>
>>> # Ah, but that's not the log message I specified!
>>> % xxd dump-fragment.txt
>>> 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V
>>> 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r
>>> 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul
>>> # It dropped these bytes: ^ ^ ^
>>>
>>>> Barry, got time to test/trace it?
>> Hm, that's not nice. Silently dropped bytes aren't good. The user should
>> at least be informed about what's happening...
>>
>
> +1 (want to write the patch?)
>
> Daniel
> (who won't have time to review patches in the near future)

There have been some new thoughts about the remaining validations in
http://subversion.tigris.org/servlets/ReadMsg?listName=dev&msgNo=141457
amounting to not validating log messages traveling towards the user.

Answering the original question:

On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry_at_barrys-emacs.org> wrote:
> Using the svn_client API is it possible for a client to write
> none-UTF-8 log messages?

No, it is not possible to send a non-UTF8 log message using the svn
cmdline client, since it performs a conversion to UTF8-with-LF.

It is, however, possible to do so using any other, lenient client. But
since the patch for 1796 was committed (around 6 Jun 2008), the svn
*server* rejects all non-UTF8 log messages from whichever client.

The dropped bytes issue above is not yet accounted for, but probably
caused by that conversion in the svn cmdline client.

(I guess it's that "translate_string" line of code that I switched off
in my 2nd attachment to issue 1796 on the issue tracker site, trying to
prove a point.

Index: subversion/svn/util.c
===================================================================
--- subversion/svn/util.c (revision 31304)
+++ subversion/svn/util.c (working copy)
@@ -651,14 +651,10 @@
  svn_stringbuf_appendcstr(default_msg, APR_EOL_STR APR_EOL_STR);

  *tmp_file = NULL;
- if (lmb->message)
+ if (1)
    {
- svn_string_t *log_msg_string = svn_string_create(lmb->message, pool);
-
- SVN_ERR_W(svn_subst_translate_string(&log_msg_string, log_msg_string,
- lmb->message_encoding, pool),
- _("Error normalizing log message to internal format"));
-
+ SVN_ERR(svn_cmdline_printf(pool, "*** TEST BUILD: FORGING COMMIT
MESSAGE ***\n"));
+ svn_string_t *log_msg_string =
svn_string_create("forged\r\ncommit\r\nmessage\r\n", pool);
       *log_msg = log_msg_string->data;

       /* Trim incoming messages the EOF marker text and the junk that

)

-- 
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696  mobile: +49 177 2345869  fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194

Received on 2008-08-05 01:51:51 CEST

This is an archived mail posted to the Subversion Dev mailing list.