[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[RFC] Checking paths are legal

From: Jon Foster <jon_at_jon-foster.co.uk>
Date: 2004-09-02 01:08:58 CEST

Hi,

Currently, if a Windows user checks out a directory called "aux",
they get an incomprehensible Subversion error. This is because
"aux", "nul", etc and names containing ":" cannot be used as file or
directory names on Windows. I'd like to make the error messages
explicit about the problem.

Also, Subversion does not support files where the filename contains
control characters (i.e. ASCII or [equivalently] Unicode values less
than 32). Currently adding one of these files silently corrupts the
working copy, and all further Subversion operations fail. I'd like
to put in a sane error here, and not break the working copy. This
isn't a problem on Windows, which doesn't support such files anyway,
but can be a problem on Linux.

Related issues:
   Issue 1277 (Windows doesn't support files containing ':')
   Issue 1932 (Control characters are not and never will be supported)
   Issue 1954 (Control characters should be rejected gracefully)
   Issue 1830 (Newlines should be rejected gracefully)

Proposal:

Since there are two different reasons why a path may be invalid,
I propose adding two new functions to check paths and two new error
codes to propogate this information.

We could re-use SVN_ERR_BAD_FILENAME for one or both of the errors.
However, I would prefer to use a new error code for at least the
"not supported on your OS" error so this case can be unambiguously
identified. The way SVN_ERR_BAD_FILENAME is currently used makes
me think it isn't well suited to be used as the "not supported by
Subversion" error, either, but that isn't a strong opinion. I'm
open to suggestions for good names for these errors - I'm
tentatively calling them SVN_ERR_WC_INVALID_PATH_ARCH and
SVN_ERR_WC_INVALID_PATH.

I'm not sure whether it is better for the check functions to throw
the errors themselves or to just return a boolean?

Then we need to add the checks everywhere. In general, commands
that work on repositories directly need to check only that the
filenames are legal in Subversion (i.e. no control characters),
but commands that access files on disk also need to check that
the filenames are valid on disk. More specifically:
- The working-copy library needs checks adding to ensure it only
   handles paths that are legal both on the filesystem and in
   Subversion. This should catch the update and checkout of an
   unsupported filename, which are probably the most common cases,
   and also catch "svn add" of a filename containing control
   characters.
- The ra client code needs to ensure that paths sent by the server
   are legal both on the filesystem and in Subversion. It can
   probably delegate this to the working-copy library on checkout or
   update, but it probably needs to handle this on export[*].
- It would be nice if the commandline client ensured that URLs are
   valid[*] (e.g. when doing "svn mv URL1 URL2"), but this is not
   essential so long as the ra libraries check this.
- The ra server code needs to ensure that paths sent by the client
   are legal in Subversion[*], but it doesn't matter whether or not
   they are valid on the server's OS as they're not going to be put
   into normal files. (Note that the DAV protocol doesn't allow
   invalid paths to be specified, so the DAV server may already do
   this implicitly. The svnserve server probably needs to check
   explicitly). It may be possible to delegate this to the fs
   library.
- "svnadmin load" should check paths in the dumpfile are valid in
   Subversion[*]. It may be possible to delegate this to the fs
   library.
- The fs library may need to do some checks if higher levels
   don't. [*]
- Probably some more I haven't thought of yet.[*]

We also need lots of testcases for this.[*]

Attached is a FIRST DRAFT of a patch. It doesn't yet do the things
marked with [*] above, and it has had minimal testing.

I also have a release scheduling question. When (if ever) should we
plan on putting this change into the official release? I can see
five options here:
0) Reject this idea totally.
1) Schedule this for 1.2.0
2) Schedule a cut-down version of this for 1.1.x, x>0, that doesn't
    include the new public API. This means the is-valid-filename
    tests would have to be duplicated as a library-private function
    in each libary, and we would have to re-use an existing error
    code. Yuck!
3) Delay 1.1.0 until this patch is finished. I do not think this
    is a good idea - it will take me a couple of weeks to get this
    finished, tested, etc, and it is quite a big change. (I would
    like to write this patch myself - it seems about the right level
    of difficulty for a Subversion newbie and will help me learn
    about all the various parts of Subversion. However, if this
    had to go into 1.1.0 then someone else with more free time would
    need to take over).
4) Put the new public API in 1.1.0, but do not call it. That makes
    it low-risk in 1.1.0, but we do need to get the API design issues
    sorted out quickly. Schedule the rest of this change for 1.1.1
    or 1.1.2.
My preferred solution is option (4). This change will be very useful
for Windows users, so I'd rather they didn't have to wait until
Subversion 1.2.

Comments on any of the above are welcome. Sorry this mail was so
long!

Thanks & regards,

Jon Foster

--
Log message for the patch (just because people have said it helps
them to review draft patches):
Check paths are valid
* subversion/include/svn_error_codes.h
     (SVN_ERR_WC_INVALID_PATH, SVN_ERR_WC_INVALID_PATH_ARCH): New errors
* subversion/include/svn_path.h
* subversion/libsvn_subr/path.c
     (svn_path_is_valid_in_svn, svn_path_is_valid_on_disk): New functions
* src-trunk/subversion/libsvn_wc/copy.c
     (svn_wc_copy): Check path valid both on disk and in svn
* subversion/libsvn_wc/adm_ops.c
     (svn_wc_add): Check path valid both on disk and in svn
* subversion/libsvn_wc/update_editor.c
     (add_directory, add_or_open_file): Check path valid both on disk and in svn
* subversion/libsvn_ra_local/ra_plugin.c:
     (svn_ra_local_do_check_path): Check path valid in svn
* subversion/libsvn_ra_svn/client.c:
     (ra_svn_check_path): Check path valid in svn
* subversion/libsvn_ra_dav/props.c:
     (svn_ra_dav_do_check_path): Check path valid in svn

Index: src-trunk/subversion/include/svn_error_codes.h
===================================================================
--- src-trunk/subversion/include/svn_error_codes.h (revision 10797)
+++ src-trunk/subversion/include/svn_error_codes.h (working copy)
@@ -361,6 +361,14 @@
               SVN_ERR_WC_CATEGORY_START + 22,
               "Path syntax not supported in this context")
 
+ SVN_ERRDEF (SVN_ERR_WC_INVALID_PATH,
+ SVN_ERR_WC_CATEGORY_START + 23,
+ "Path name is not allowed by Subversion")
+
+ SVN_ERRDEF (SVN_ERR_WC_INVALID_PATH_ARCH,
+ SVN_ERR_WC_CATEGORY_START + 24,
+ "Path name is not allowed by this OS")
+
   /* fs errors */
 
   SVN_ERRDEF (SVN_ERR_FS_GENERAL,
Index: src-trunk/subversion/include/svn_path.h
===================================================================
--- src-trunk/subversion/include/svn_path.h (revision 10797)
+++ src-trunk/subversion/include/svn_path.h (working copy)
@@ -327,10 +327,55 @@
  */
 svn_boolean_t svn_path_is_single_path_component (const char *name);
 
+/**
+ * @since New in 1.1.
+ *
+ * Test to see if a path is allowed in a Subversion repository.
+ * Paths containing ASCII control characters (1-31) are forbidden.
+ * Currently, all other paths are allowed.
+ *
+ * This must be OS-independent - it is used to decide what paths
+ * are allowed to be added to a Subversion repository, so the
+ * file either already exists on disk (in the case of svn add)
+ * or may never exist on disk (in the case of svn mv with two
+ * URLs).
+ *
+ * Note that this does NOT guarantee that the path can be checked
+ * out on this particular operating system - for that, use
+ * svn_path_is_valid_on_disk().
+ *
+ * May be called with a path component or a complete path. The
+ * path must not be URL-encoded.
+ *
+ * If path is invalid, return @c FALSE.
+ * If path is valid, return @c TRUE.
+ */
+svn_boolean_t svn_path_is_valid_in_svn (const char *path);
 
 /**
  * @since New in 1.1.
  *
+ * Test to see if a path is allowed in a Subversion working copy.
+ * This is operating-system dependent - for example Windows
+ * prohibits files and directories called "aux" but Linux allows
+ * them.
+ *
+ * This is a subset of svn_path_is_valid_in_svn - i.e. for any path
+ * that svn_path_is_valid_in_svn returns @c FALSE, this returns
+ * @c FALSE.
+ *
+ * May be called with a path component or a complete path. The
+ * path must not be URL-encoded and (on Windows) must not contain
+ * a drive letter.
+ *
+ * If path is invalid, return @c FALSE.
+ * If path is valid, return @c TRUE.
+ */
+svn_boolean_t svn_path_is_valid_on_disk (const char *path);
+
+/**
+ * @since New in 1.1.
+ *
  * Test to see if a backpath, i.e. '..', is present in @a path.
  * If not, return @c FALSE.
  * If so, return @c TRUE.
Index: src-trunk/subversion/libsvn_subr/path.c
===================================================================
--- src-trunk/subversion/libsvn_subr/path.c (revision 10797)
+++ src-trunk/subversion/libsvn_subr/path.c (working copy)
@@ -678,8 +678,156 @@
   return TRUE;
 }
 
+svn_boolean_t
+svn_path_is_valid_in_svn (const char *path)
+{
+ const char *p;
+ for (p = path; *p; p++)
+ {
+ if ((unsigned char)*p < (unsigned char)32u)
+ return FALSE;
+ }
+ return TRUE;
+}
 
+#ifdef WIN32
+/* Check to see if a path segment matches the Windows reserved names.
+
+ These rules are taken from
+ http://msdn.microsoft.com/library/en-us/fileio/base/naming_a_file.asp
+
+ The following reserved device names cannot be used as the name
+ of a file:
+ CON, PRN, AUX, NUL,
+ COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9,
+ LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9.
+ Also avoid these names followed by an extension (for example,
+ NUL.tx7).
+
+ Windows NT: CLOCK$ is also a reserved device name.
+
+ These names are case-insensetive.
+
+ The path segment passed is defined as:
+
+ /some/path_to/a_file\0
+ ^ ^^ ^^ ^
+ | || || +--segment_end 3
+ | || |+--------segment_start 3
+ | || +--segment_end 2
+ | |+---------segment_start 2
+ | +--segment_end 1
+ +------segment_start 1
+
+ I.e. segment_end always points to a '/' or '\0'.
+ */
+static svn_boolean_t
+path_segment_is_valid_on_disk (const char *segment_start,
+ const char *segment_end)
+{
+ /* We can decide whether or not it is valid based only on
+ the first 7 characters. This avoids dynamic memory allocation
+ and is simpler than trying to look at the original string.
+ 7 characters is "CLOCK$z" (valid), with 6 characters we would
+ be unable to distinguish that from "CLOCK$" (invalid).
+ Note that since using these identifiers with an extension is
+ prohibited, we truncate just before any period. We also
+ take this opertunity to lowercase the string so we don't
+ need to do case-insensetive comparisons later.
+ */
+ char buf[8];
+ apr_size_t i;
+ apr_size_t length = segment_end - segment_start;
+ if (length < 3)
+ return TRUE;
+ if (length > 7)
+ length = 7;
+ for (i = 0; i < length; i++)
+ {
+ if (segment_start[i] == '.')
+ {
+ length = i;
+ break;
+ }
+ buf[i] = apr_tolower(segment_start[i]);
+ }
+ buf[length] = '\0';
+
+ if (length == 3)
+ {
+ if (!strcmp(buf, "aux") || !strcmp(buf, "con")
+ || !strcmp(buf, "nul") || !strcmp(buf, "prn"))
+ return FALSE;
+ }
+ else if (length == 4)
+ {
+ if ((!strncmp(buf, "con", 3) || !strncmp(buf, "lpt", 3))
+ && apr_isdigit(buf[4]))
+ return FALSE;
+ }
+ else if (length == 6)
+ {
+ if (!strcmp(buf, "clock$"))
+ return FALSE;
+ }
+ return TRUE;
+}
+
+/* Copied from apr_c_is_fnchar in apr/file_io/win32/filesys.c and
+ IS_FNCHAR in apr/arch/win32/apr_arch_fileio.c and simplified.
+ There doesn't seem to be a public API for this APR functionality.
+ */
+static const char svn_c_is_fnchar[256] =
+{/* Reject all ctrl codes... */
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ /* ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? */
+ 1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,0, 1,1,1,1,1,1,1,1,1,1,0,1,0,1,0,0,
+ /* @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ */
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,
+ /* ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ */
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,
+ /* High bit codes are accepted (subject to utf-8->Unicode xlation) */
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
+};
+#define SVN_IS_FNCHAR(c) (svn_c_is_fnchar[(unsigned char)(c)])
+
+#endif /* def WIN32 */
+
 svn_boolean_t
+svn_path_is_valid_on_disk (const char *path)
+{
+#ifdef WIN32
+ const char *segment_start = path;
+ const char *p = path;
+ for (; *p; p++)
+ {
+ if (*p == '/')
+ {
+ if (!path_segment_is_valid_on_disk(segment_start, p))
+ return FALSE;
+ segment_start = p + 1;
+ }
+ else if (!SVN_IS_FNCHAR(*p))
+ return FALSE;
+ }
+ if (!path_segment_is_valid_on_disk(segment_start, p))
+ return FALSE;
+ return TRUE;
+#else /* ndef WIN32 */
+ /* TODO FIXME Windows and Linux are not the only operating systems
+ in the world.
+ */
+ /* Allows all paths. For Linux this is right, for other OS this
+ is a best guess and should be fixed.
+ */
+ return svn_path_is_valid_in_svn(path);
+#endif
+}
+
+svn_boolean_t
 svn_path_is_backpath_present (const char *path)
 {
   int len = strlen (path);
Index: src-trunk/subversion/libsvn_wc/copy.c
===================================================================
--- src-trunk/subversion/libsvn_wc/copy.c (revision 10797)
+++ src-trunk/subversion/libsvn_wc/copy.c (working copy)
@@ -380,6 +380,15 @@
 
   SVN_ERR (svn_io_check_path (src_path, &src_kind, pool));
   
+ if (!svn_path_is_valid_in_svn(dst_basename))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("The path '%s' is not allowed in a Subversion"
+ " repository"), dst_basename);
+ else if (!svn_path_is_valid_on_disk(dst_basename))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH_ARCH, NULL,
+ _("The path '%s' is not allowed by your OS"),
+ dst_basename);
+
   if (src_kind == svn_node_file)
     SVN_ERR (copy_file_administratively (src_path, adm_access,
                                          dst_parent, dst_basename,
Index: src-trunk/subversion/libsvn_wc/adm_ops.c
===================================================================
--- src-trunk/subversion/libsvn_wc/adm_ops.c (revision 10797)
+++ src-trunk/subversion/libsvn_wc/adm_ops.c (working copy)
@@ -879,6 +879,15 @@
   apr_uint32_t modify_flags = 0;
   svn_wc_adm_access_t *adm_access;
   
+ if (!svn_path_is_valid_in_svn(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("The path '%s' is not allowed in a Subversion"
+ " repository"), path);
+ else if (!svn_path_is_valid_on_disk(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH_ARCH, NULL,
+ _("The path '%s' is not allowed by your OS"),
+ path);
+
   /* Make sure something's there. */
   SVN_ERR (svn_io_check_path (path, &kind, pool));
   if (kind == svn_node_none)
Index: src-trunk/subversion/libsvn_wc/update_editor.c
===================================================================
--- src-trunk/subversion/libsvn_wc/update_editor.c (revision 10797)
+++ src-trunk/subversion/libsvn_wc/update_editor.c (working copy)
@@ -1013,6 +1013,16 @@
       || ((! copyfrom_path) && (SVN_IS_VALID_REVNUM (copyfrom_revision))))
     abort();
 
+ /* Check path is allowed */
+ if (!svn_path_is_valid_in_svn(db->path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("Failed to add directory '%s': Path is not"
+ " allowed in a Subversion repository"), db->path);
+ else if (!svn_path_is_valid_on_disk(db->path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH_ARCH, NULL,
+ _("Failed to add directory '%s': Path is not"
+ " allowed by your OS"), db->path);
+
   /* There should be nothing with this name. */
   SVN_ERR (svn_io_check_path (db->path, &kind, db->pool));
   if (kind != svn_node_none)
@@ -1425,10 +1435,26 @@
   const svn_wc_entry_t *entry;
   svn_node_kind_t kind;
   svn_wc_adm_access_t *adm_access;
+ apr_pool_t *subpool;
 
+ if (!svn_path_is_valid_in_svn(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ adding ? _("Failed to add file '%s': Path "
+ "is not allowed in a Subversion "
+ "repository")
+ : _("Failed to update file '%s': Path "
+ "is not allowed in a Subversion "
+ "repository"), path);
+ else if (!svn_path_is_valid_on_disk(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH_ARCH, NULL,
+ adding ? _("Failed to add file '%s': Path "
+ "is not allowed by your OS")
+ : _("Failed to update file '%s': Path "
+ "is not allowed by your OS"), path);
+
   /* the file_pool can stick around for a *long* time, so we want to use
      a subpool for any temporary allocations. */
- apr_pool_t *subpool = svn_pool_create (pool);
+ subpool = svn_pool_create (pool);
 
   /* ### kff todo: if file is marked as removed by user, then flag a
      conflict in the entry and proceed. Similarly if it has changed
Index: src-trunk/subversion/libsvn_client/copy.c
===================================================================
--- src-trunk/subversion/libsvn_client/copy.c (revision 10797)
+++ src-trunk/subversion/libsvn_client/copy.c (working copy)
@@ -750,6 +750,7 @@
                   svn_client_ctx_t *ctx,
                   apr_pool_t *pool)
 {
+ /* ### TODO FIXME make sure dst_path is legal... */
   void *ra_baton, *sess;
   svn_ra_plugin_t *ra_lib;
   svn_node_kind_t src_kind, dst_kind;
Index: src-trunk/subversion/libsvn_ra_local/ra_plugin.c
===================================================================
--- src-trunk/subversion/libsvn_ra_local/ra_plugin.c (revision 10797)
+++ src-trunk/subversion/libsvn_ra_local/ra_plugin.c (working copy)
@@ -656,6 +656,11 @@
   svn_fs_root_t *root;
   const char *abs_path = sbaton->fs_path;
 
+ if (path && !svn_path_is_valid_in_svn(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("The path '%s' is not allowed in a Subversion"
+ " repository"), path);
+
   /* ### Not sure if this counts as a workaround or not. The
      session baton uses the empty string to mean root, and not
      sure that should change. However, it would be better to use
Index: src-trunk/subversion/libsvn_ra_svn/client.c
===================================================================
--- src-trunk/subversion/libsvn_ra_svn/client.c (revision 10797)
+++ src-trunk/subversion/libsvn_ra_svn/client.c (working copy)
@@ -1089,6 +1089,11 @@
   svn_ra_svn_conn_t *conn = sess->conn;
   const char *kind_word;
 
+ if (path && !svn_path_is_valid_in_svn(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("The path '%s' is not allowed in a Subversion"
+ " repository"), path);
+
   SVN_ERR(svn_ra_svn_write_cmd(conn, pool, "check-path", "c(?r)", path, rev));
   SVN_ERR(handle_auth_request(sess, pool));
   SVN_ERR(svn_ra_svn_read_cmd_response(conn, pool, "w", &kind_word));
Index: src-trunk/subversion/libsvn_ra_dav/props.c
===================================================================
--- src-trunk/subversion/libsvn_ra_dav/props.c (revision 10797)
+++ src-trunk/subversion/libsvn_ra_dav/props.c (working copy)
@@ -1098,6 +1098,11 @@
   svn_error_t *err;
   svn_boolean_t is_dir;
 
+ if (path && !svn_path_is_valid_in_svn(path))
+ return svn_error_createf (SVN_ERR_WC_INVALID_PATH, NULL,
+ _("The path '%s' is not allowed in a Subversion"
+ " repository"), path);
+
   /* ### For now, using svn_ra_dav__get_baseline_info() works because
      we only have three possibilities: dir, file, or none. When we
      add symlinks, we will need to do something different. Here's one

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Sep 2 01:09:02 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.