[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: issue #1796

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Sun, 25 May 2008 00:21:19 +0200

Hash: SHA1

Karl Fogel wrote:
> Neels Janosch Hofmeyr <neels_at_elego.de> writes:
>> I am busy on a fix for issue
>> #1796: defective or malicious client can corrupt repository log messages
>> and would like to validate that property values are good UTF8.
>> There is a private function in subversion/libsvn_subr/utf.c:
>> check_utf8(const char *data, apr_size_t len, apr_pool_t *pool)
>> All of the *public* methods in subversion/include/svn_utf.h produce new
>> data. I just need a validation.
>> Is it ok to publish check_utf8 in subversion/include/svn_utf.h? Is that
>> "just another fix" or does it have release implications? And how would I
>> go about it; it would probably have to be called svn_utf_check_utf8... ?
> You're saying you need a public function for confirming that a string is
> valid utf8? Sure, you can just move the private function to public,
> giving it the "svn_utf_" prefix and changing all callers. See
> http://subversion.tigris.org/hacking.html#other-conventions the formal
> conventions. You'd need to put a "@since New in 1.6" tag on the public
> doc string.
> But why do you need it to be public? Could you explain in more detail?

Sure. So, we've got the subversion server (libsvn_repos) which accepts
any garbage written in commit log messages. It doesn't enforce
consistent line endings, and, AFAIK, neither does it check whether the
log is valid UTF8, which it should be according to spec.

Hm, maybe I should test that before making statements...
...tested and my statement holds.

- From wikipedia, I gather that 0xff never appears in UTF-8:

"UTF-8 strings can be fairly reliably recognized as such by a simple
algorithm. That is, the probability that a string of characters in any
other encoding appears as valid UTF-8 is low, diminishing with
increasing string length. For instance, the octet values C0, C1, and F5
to FF never appear."

Thus, I forge my svn client to commit the following log msg:
  char msg[4] = { 'a', 0xff, 'b', 0 };

which results in the following output:

+ svn ci -m 'ignored message'
Adding x
Transmitting file data .
Committed revision 1.

+ svn log x
+ cat -A
- ------------------------------------------------------------------------$
r1 | (no author) | 2008-05-25 00:11:53 +0200 (Sun, 25 May 2008) | 1 line$
- ------------------------------------------------------------------------$

+ svn log --xml -r1
+ cat -A
<?xml version="1.0"?>$
+ cd ../..
+ pwd

+ svnadmin dump --incremental -r1 repos/r1796
* Dumped revision 1.
+ cat -A
SVN-fs-dump-format-version: 2$
V 3$
K 8$
V 27$

So, picking up from above, libsvn-repos accepts log messages in
non-UTF-8 and/or inconsistent line ending style and stores them in the

As far as I have understood, we are obligated to disallow this.

So, I would like to check incoming log messages (or even all incoming
svn: prop values, depending on whether this is nice) for their
compliance with UTF-8 and LF line ending style.

Checking for LF is easy. For UTF-8, there is this function in
subversion/libsvn_subr/utf.c called check_utf8(..), which I gather I
cannot access from libsvn_repos unless it is made public in

So, I want to rename check_utf8 to svn_utf_check_utf8, put a "@since New
in 1.6" tag on the public doc string, publish it in include/svn_utf.h,
adjust all callers and use it in libsvn_repos/fs-wrap.c, in function

Am I on the right track here? :)


- --
Neels Hofmeyr -- elego Software Solutions GmbH
Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany
phone: +49 30 23458696 mobile: +49 177 2345869 fax: +49 30 23458695
http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin
Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-05-25 00:21:47 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.