[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Issue #2748: non-UTF-8 filenames in the repository

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Wed, 17 Sep 2008 11:05:39 -0400

Daniel Shahaf wrote:
> C. Michael Pilato wrote on Wed, 17 Sep 2008 at 10:44 -0400:
>> Daniel Shahaf wrote:
>>> (And I'll also fix the comments in svn_fs.h to say explicitly that
>>> control characters aren't allowed; svn_path_check_valid() already
>>> forbids them.)
>> Hrm... control characters *are* allowed, though. What you see in svn_fs.h
>> is the full set of requirements *by the FS layer* for its paths. UTF-8 with
>> no 0-offset characters. That's the whole story. To add control characters
>> to the blacklist now is a violation of the API contract. If that means that
>> svn_path_check_valid() isn't the right function to use for this validation
>> -- or that it needs a new boolean flag that toggles the checks for control
>> characters -- so be it.
>>
>
> But, in this case, when *should* svn_path_check_valid() be used? Why
> does it check for control characters -- which of its callers need this
> check?

There are two different set of requirements in the mix here.

libsvn_fs doesn't care about control characters, nor does it have any reason
to. It's just a versioned filesystem, right? It doesn't have hooks, or
print output, or speak XML. In fact, it doesn't use the paths provided it
in any way that doesn't involve its own virtual storage system. The only
reason it has the one little can't-have-a-null-byte rule is because without
it we couldn't pass C strings around, and custom counted strings would be a
drag to maintain.

The *rest* of Subversion cares, though. You can't pass control characters
over our XML WebDAV reports. You can't store them in XML .svn/entries files
(back when we had such). They look ugly in screen output. That's why the
Subversion *client* layers protect against such things. And, arguably,
mod_dav_svn and svnserve (as entry points to the Subversion system) should
be performing similar checks of their network input.

This duality has always existed in Subversion, and it makes perfect sense.
That's the beauty of modularized code and solid APIs. Sure, you can use the
FS APIs to inject paths into your filesystem that the rest of Subversion
will croak on, but that's an understood risk. What we want to disallow are:

   * Using the FS APIs to introduce paths that violate the FS constraints

   * Using higher-level Subversion access points to introduce paths that
     violate the requirements of those higher-level Subversion layers.

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2008-09-17 17:05:58 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.