[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] Fix svnadmin verify to fully verify a revision

From: Eric Gillespie <epg_at_google.com>
Date: Tue, 12 Feb 2008 18:59:47 -0800

At Google, we have a nasty kind of corruption that we have not
yet tracked down. You've seen David Glasser commit a few
different bug fixes in the last few months; each of those we
thought was what we were seeing, but whatever it is is still
lurking in the code base. It does not seem to be the same
corruption the fsfsverify.py folks have been dealing with.

Creating a new revision that changes /A/B/E/bravo means creating
new directory listings for /, /A, /A/B, and /A/B/E in the new
revision, with each entry not changed in the new revision a link
back to the entry in a previous revision. svn_repos_replay()ing
a revision does not verify that those links are correct.

In fsfs, these links look like this:

    K 1
    C
    V 17
    dir 7-1.0.r1/1568

That tells us to seek to 1568 in db/revs/1 to find A/C. What we
keep encountering are revs files where that offset 1568 is
*wrong*; at that offset you don't find A/C; you're reading in the
middle of some other data.

This patch makes svnadmin verify that subtle part of a revision.
I think this should go into 1.5, but I marked svn_repos_verify_fs
as being new in 1.6. If we don't push this into 1.5, we need to
revert r24559 on the 1.5.x branch. It makes svnadmin verify
unusably slow on large repositories, but at least when it does
finally finish, you know it told you the truth.

[[[
Implement svn_repos_verify_fs and use it in svnadmin verify rather
than using svn_repos_dump_fs2, restoring complete verification after
losing it in r24559.

In r24559, I changed verify's use of svn_repos_dump_fs2 to dump only
the paths changed in the revision being dumped rather than dumping the
entire repository as of that revision. Without r24559, verifying
large repositories is effectively impossible. But without dumping the
entire repository contents, we were no longer verifying the links back
to previous revisions for the paths *not* changed in the revision
being verified.

I don't have bdb built, and thus have not tested with bdb. If
some kind bdb person would test that svnadmin verify still works
there, I would appreciate it. You won't actually find any
corruption, hopefully ;->.

* subversion/libsvn_fs_fs/fs_fs.c
  (svn_fs_fs__get_node_revision): Check that the HEADER_ID field is
    actually in the node-rev header block; all other fields are
    already checked, only this one was not. This led to a segmentation
    fault in the particular corrupted revs file I created, as I simply
    incremented an offset, resulting in a "d" header but no "id" header.

* subversion/libsvn_repos/dump.c
  (verify_directory_entry): Add function that does a simple check that
    a directory entry is valid (svn_fs_dir_entries for directories and
    svn_fs_file_length for files).
  (verify_close_directory): Add svn_delta_editor_t.close_directory
    implementation that verify_directory_entry()s each entry.
  (svn_repos_verify_fs): Add new function to verify a file system,
    mostly by invoking svn_repos_replay2 with the dump editor, but
    with the additional check of verify_close_directory.

* subversion/include/svn_repos.h
  (svn_repos_verify_fs): Declare.
  (svn_repos_dump_fs2): Change doc string to point to
    svn_repos_verify_fs for verifying a file system.

* subversion/svnadmin/main.c
  (dump_repo): Remove.
  (subcommand_dump): Fold dump_repo body into here.
  (subcommand_verify): Use svn_repos_verify_fs rather than dump_repo.

* subversion/tests/cmdline/svnadmin_tests.py
  (verify_incremental_fsfs): Add test that corrupts a revision file in
    a way that svn_repos_dump_fs2 cannot detect and asserts that
    svnadmin verify detects it.
]]]

Index: subversion/svnadmin/main.c
===================================================================
--- subversion/svnadmin/main.c (revision 29308)
+++ subversion/svnadmin/main.c (working copy)
@@ -649,13 +649,9 @@
 }
 
 
-/* Common implementation for dump and verify. First three parameters mirror
- the 'svn_opt_subcommand_t' type. The DUMP_CONTENTS parameter determines
- whether to send the dump to stdout or an empty stream. */
+/* This implements `svn_opt_subcommand_t'. */
 static svn_error_t *
-dump_repo(apr_getopt_t *os, void *baton,
- apr_pool_t *pool, svn_boolean_t dump_contents,
- svn_boolean_t incremental)
+subcommand_dump(apr_getopt_t *os, void *baton, apr_pool_t *pool)
 {
   struct svnadmin_opt_state *opt_state = baton;
   svn_repos_t *repos;
@@ -690,19 +686,14 @@
       (SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
        _("First revision cannot be higher than second"));
 
- /* Run the dump to STDOUT. Let the user redirect output into
- a file if they want. :-) */
- if (dump_contents)
- SVN_ERR(create_stdio_stream(&stdout_stream, apr_file_open_stdout, pool));
- else
- stdout_stream = NULL;
+ SVN_ERR(create_stdio_stream(&stdout_stream, apr_file_open_stdout, pool));
 
   /* Progress feedback goes to STDERR, unless they asked to suppress it. */
   if (! opt_state->quiet)
     stderr_stream = recode_stream_create(stderr, pool);
 
   SVN_ERR(svn_repos_dump_fs2(repos, stdout_stream, stderr_stream,
- lower, upper, incremental,
+ lower, upper, opt_state->incremental,
                              opt_state->use_deltas, check_cancel, NULL,
                              pool));
 
@@ -712,15 +703,6 @@
 
 /* This implements `svn_opt_subcommand_t'. */
 static svn_error_t *
-subcommand_dump(apr_getopt_t *os, void *baton, apr_pool_t *pool)
-{
- struct svnadmin_opt_state *opt_state = baton;
- return dump_repo(os, baton, pool, TRUE, opt_state->incremental);
-}
-
-
-/* This implements `svn_opt_subcommand_t'. */
-static svn_error_t *
 subcommand_help(apr_getopt_t *os, void *baton, apr_pool_t *pool)
 {
   struct svnadmin_opt_state *opt_state = baton;
@@ -1122,8 +1104,29 @@
 subcommand_verify(apr_getopt_t *os, void *baton, apr_pool_t *pool)
 {
   struct svnadmin_opt_state *opt_state = baton;
- return dump_repo(os, baton, pool, FALSE,
- opt_state->start_revision.kind != svn_opt_revision_unspecified);
+ svn_stream_t *stderr_stream;
+ svn_repos_t *repos;
+ svn_fs_t *fs;
+ svn_revnum_t youngest, lower, upper;
+
+ SVN_ERR(open_repos(&repos, opt_state->repository_path, pool));
+ fs = svn_repos_fs(repos);
+ SVN_ERR(svn_fs_youngest_rev(&youngest, fs, pool));
+
+ /* Find the revision numbers at which to start and end. */
+ SVN_ERR(get_revnum(&lower, &opt_state->start_revision,
+ youngest, repos, pool));
+ SVN_ERR(get_revnum(&upper, &opt_state->end_revision,
+ youngest, repos, pool));
+
+ if (opt_state->quiet)
+ stderr_stream = NULL;
+ else
+ stderr_stream = recode_stream_create(stderr, pool);
+
+ SVN_ERR(open_repos(&repos, opt_state->repository_path, pool));
+ return svn_repos_verify_fs(repos, stderr_stream, lower, upper,
+ check_cancel, NULL, pool);
 }
 
 
Index: subversion/include/svn_repos.h
===================================================================
--- subversion/include/svn_repos.h (revision 29279)
+++ subversion/include/svn_repos.h (working copy)
@@ -1937,12 +1937,41 @@
   svn_repos_load_uuid_force
 };
 
+
 /**
+ * Verify the contents of the file system in @a repos.
+ *
+ * If @a feedback_stream is not @c NULL, write feedback to it (lines of
+ * the form "* Verified revision %ld\n").
+ *
+ * If @a start_rev is @c SVN_INVALID_REVNUM, then start verifying at
+ * revision 0. If @a end_rev is @c SVN_INVALID_REVNUM, then verify
+ * through the @c HEAD revision.
+ *
+ * If @a cancel_func is not @c NULL, call it periodically with @a
+ * cancel_baton as argument to see if the caller wishes to cancel the
+ * verification.
+ *
+ * @since New in 1.6.
+ */
+svn_error_t *
+svn_repos_verify_fs(svn_repos_t *repos,
+ svn_stream_t *feedback_stream,
+ svn_revnum_t start_rev,
+ svn_revnum_t end_rev,
+ svn_cancel_func_t cancel_func,
+ void *cancel_baton,
+ apr_pool_t *pool);
+
+
+/**
  * Dump the contents of the filesystem within already-open @a repos into
  * writable @a dumpstream. Begin at revision @a start_rev, and dump every
  * revision up through @a end_rev. Use @a pool for all allocation. If
- * non-_at_c NULL, send feedback to @a feedback_stream. @a dumpstream can be
- * @c NULL for the purpose of verifying the repository.
+ * non-_at_c NULL, send feedback to @a feedback_stream. If @a dumpstream is
+
+ * @c NULL, this is effectively a primitive verify. It is not complete,
+ * however; see svn_fs_verify instead.
  *
  * If @a start_rev is @c SVN_INVALID_REVNUM, then start dumping at revision
  * 0. If @a end_rev is @c SVN_INVALID_REVNUM, then dump through the @c HEAD
Index: subversion/tests/cmdline/svnadmin_tests.py
===================================================================
--- subversion/tests/cmdline/svnadmin_tests.py (revision 29308)
+++ subversion/tests/cmdline/svnadmin_tests.py (working copy)
@@ -433,6 +433,137 @@
 
 #----------------------------------------------------------------------
 
+def verify_incremental_fsfs(sbox):
+ """Assert that svnadmin verify detects corruption svn_repos_dump_fs2 can't."""
+
+ # setup a repo with a directory 'c:hi'
+ sbox.build(create_wc = False)
+ repo_url = sbox.repo_url
+ E_url = sbox.repo_url + '/A/B/E'
+
+ # Create A/B/E/bravo in r2.
+ svntest.actions.run_and_verify_svn(None, None, [],
+ 'mkdir', '-m', 'log_msg',
+ E_url + '/bravo')
+ # Corrupt r2's reference to A/C by replacing "dir 7-1.0.r1/1568" with
+ # "dir 7-1.0.r1/1569" (increment offset) and updating the checksum for
+ # this directory listing to "c9b5a2d26473a4e28088673dda9df804" so that
+ # the listing itself is valid.
+ r2 = sbox.repo_dir + "/db/revs/0/2"
+ fp = open(r2, 'w')
+ fp.write("""id: 0-2.0.r2/0
+type: dir
+count: 0
+cpath: /A/B/E/bravo
+copyroot: 0 /
+
+PLAIN
+K 5
+alpha
+V 17
+file 3-1.0.r1/719
+K 4
+beta
+V 17
+file 4-1.0.r1/840
+K 5
+bravo
+V 14
+dir 0-2.0.r2/0
+END
+ENDREP
+id: 2-1.0.r2/181
+type: dir
+pred: 2-1.0.r1/1043
+count: 1
+text: 2 69 99 99 f63001f7fddd1842d8891474d0982111
+cpath: /A/B/E
+copyroot: 0 /
+
+PLAIN
+K 1
+E
+V 16
+dir 2-1.0.r2/181
+K 1
+F
+V 17
+dir 5-1.0.r1/1160
+K 6
+lambda
+V 17
+file 6-1.0.r1/597
+END
+ENDREP
+id: 1-1.0.r2/424
+type: dir
+pred: 1-1.0.r1/1335
+count: 1
+text: 2 316 95 95 bccb66379b4f825dac12b50d80211bae
+cpath: /A/B
+copyroot: 0 /
+
+PLAIN
+K 1
+B
+V 16
+dir 1-1.0.r2/424
+K 1
+C
+V 17
+dir 7-1.0.r1/1569
+K 1
+D
+V 17
+dir 8-1.0.r1/3061
+K 2
+mu
+V 18
+file i-1.0.r1/1451
+END
+ENDREP
+id: 0-1.0.r2/692
+type: dir
+pred: 0-1.0.r1/3312
+count: 1
+text: 2 558 121 121 c9b5a2d26473a4e28088673dda9df804
+cpath: /A
+copyroot: 0 /
+
+PLAIN
+K 1
+A
+V 16
+dir 0-1.0.r2/692
+K 4
+iota
+V 18
+file j-1.0.r1/3428
+END
+ENDREP
+id: 0.0.r2/904
+type: dir
+pred: 0.0.r1/3624
+count: 2
+text: 2 826 65 65 e44e4151d0d124533338619f082c8c9a
+cpath: /
+copyroot: 0 /
+
+_0.0.t1-1 add false false /A/B/E/bravo
+
+
+904 1031
+""")
+ fp.close()
+
+ output, errput = svntest.main.run_svnadmin("verify", "-r2", sbox.repo_dir)
+ svntest.verify.verify_outputs(
+ message=None, actual_stdout=output, actual_stderr=errput,
+ expected_stdout=None,
+ expected_stderr=".*Found malformed header in revision file")
+
+#----------------------------------------------------------------------
+
 def recover_fsfs(sbox):
   "recover a repository (FSFS only)"
   sbox.build()
@@ -706,6 +837,7 @@
               hotcopy_format,
               setrevprop,
               verify_windows_paths_in_repos,
+ SkipUnless(verify_incremental_fsfs, svntest.main.is_fs_type_fsfs),
               SkipUnless(recover_fsfs, svntest.main.is_fs_type_fsfs),
               load_with_parent_dir,
               set_uuid,
Index: subversion/libsvn_repos/dump.c
===================================================================
--- subversion/libsvn_repos/dump.c (revision 29279)
+++ subversion/libsvn_repos/dump.c (working copy)
@@ -20,6 +20,7 @@
 #include "svn_pools.h"
 #include "svn_error.h"
 #include "svn_fs.h"
+#include "svn_iter.h"
 #include "svn_repos.h"
 #include "svn_string.h"
 #include "svn_path.h"
@@ -1087,3 +1088,139 @@
                             end_rev, incremental, FALSE, cancel_func,
                             cancel_baton, pool);
 }
+
+
+/*----------------------------------------------------------------------*/
+
+/* verify, based on dump */
+
+
+/* Creating a new revision that changes /A/B/E/bravo means creating new
+ directory listings for /, /A, /A/B, and /A/B/E in the new revision, with
+ each entry not changed in the new revision a link back to the entry in a
+ previous revision. svn_repos_replay()ing a revision does not verify that
+ those links are correct.
+
+ For paths actually changed in the revision we verify, we get directory
+ contents or file length twice: once in the dump editor, and once here.
+ We could create a new verify baton, store in it the changed paths, and
+ skip those here, but that means building an entire wrapper editor and
+ managing two levels of batons. The impact from checking these entries
+ twice should be minimal, while the code to avoid it is not.
+*/
+
+static svn_error_t *
+verify_directory_entry(void *baton, const void *key, apr_ssize_t klen,
+ void *val, apr_pool_t *pool)
+{
+ struct dir_baton *db = baton;
+ char *path = svn_path_join(db->path, (const char *)key, pool);
+ svn_node_kind_t kind;
+ apr_hash_t *dirents;
+ svn_filesize_t len;
+
+ SVN_ERR(svn_fs_check_path(&kind, db->edit_baton->fs_root, path, pool));
+ switch (kind) {
+ case svn_node_dir:
+ /* Getting this directory's contents is enough to ensure that our
+ link to it is correct. */
+ SVN_ERR(svn_fs_dir_entries(&dirents, db->edit_baton->fs_root, path, pool));
+ break;
+ case svn_node_file:
+ /* Getting this file's size is enough to ensure that our link to it
+ is correct. */
+ SVN_ERR(svn_fs_file_length(&len, db->edit_baton->fs_root, path, pool));
+ break;
+ default:
+ return svn_error_createf(SVN_ERR_NODE_UNEXPECTED_KIND, NULL,
+ _("Unexpected node kind %d for '%s'"), kind, path);
+ }
+
+ return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+verify_close_directory(void *dir_baton,
+ apr_pool_t *pool)
+{
+ struct dir_baton *db = dir_baton;
+ apr_hash_t *dirents;
+ SVN_ERR(svn_fs_dir_entries(&dirents, db->edit_baton->fs_root,
+ db->path, pool));
+ SVN_ERR(svn_iter_apr_hash(NULL, dirents, verify_directory_entry,
+ dir_baton, pool));
+ return close_directory(dir_baton, pool);
+}
+
+svn_error_t *
+svn_repos_verify_fs(svn_repos_t *repos,
+ svn_stream_t *feedback_stream,
+ svn_revnum_t start_rev,
+ svn_revnum_t end_rev,
+ svn_cancel_func_t cancel_func,
+ void *cancel_baton,
+ apr_pool_t *pool)
+{
+ svn_fs_t *fs = svn_repos_fs(repos);
+ svn_revnum_t youngest;
+ svn_revnum_t rev;
+ apr_pool_t *iterpool = svn_pool_create(pool);
+
+ /* Determine the current youngest revision of the filesystem. */
+ SVN_ERR(svn_fs_youngest_rev(&youngest, fs, pool));
+
+ /* Use default vals if necessary. */
+ if (! SVN_IS_VALID_REVNUM(start_rev))
+ start_rev = 0;
+ if (! SVN_IS_VALID_REVNUM(end_rev))
+ end_rev = youngest;
+ if (! feedback_stream)
+ feedback_stream = svn_stream_empty(pool);
+
+ /* Validate the revisions. */
+ if (start_rev > end_rev)
+ return svn_error_createf(SVN_ERR_REPOS_BAD_ARGS, NULL,
+ _("Start revision %ld"
+ " is greater than end revision %ld"),
+ start_rev, end_rev);
+ if (end_rev > youngest)
+ return svn_error_createf(SVN_ERR_REPOS_BAD_ARGS, NULL,
+ _("End revision %ld is invalid "
+ "(youngest revision is %ld)"),
+ end_rev, youngest);
+
+ for (rev = start_rev; rev <= end_rev; rev++)
+ {
+ svn_delta_editor_t *dump_editor;
+ void *dump_edit_baton;
+ const svn_delta_editor_t *cancel_editor;
+ void *cancel_edit_baton;
+ svn_fs_root_t *to_root;
+
+ svn_pool_clear(iterpool);
+
+ /* Get cancellable dump editor, but with our close_directory handler. */
+ SVN_ERR(get_dump_editor((const svn_delta_editor_t **)&dump_editor,
+ &dump_edit_baton, fs, rev, "",
+ svn_stream_empty(pool), feedback_stream,
+ start_rev, FALSE, iterpool));
+ dump_editor->close_directory = verify_close_directory;
+ SVN_ERR(svn_delta_get_cancellation_editor(cancel_func, cancel_baton,
+ dump_editor, dump_edit_baton,
+ &cancel_editor,
+ &cancel_edit_baton,
+ iterpool));
+
+ SVN_ERR(svn_fs_revision_root(&to_root, fs, rev, iterpool));
+ SVN_ERR(svn_repos_replay2(to_root, "", SVN_INVALID_REVNUM, FALSE,
+ cancel_editor, cancel_edit_baton,
+ NULL, NULL, iterpool));
+ SVN_ERR(svn_stream_printf(feedback_stream, iterpool,
+ _("* Verified revision %ld.\n"),
+ rev));
+ }
+
+ svn_pool_destroy(iterpool);
+
+ return SVN_NO_ERROR;
+}
Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c (revision 29279)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -1610,6 +1610,10 @@
 
   /* Read the node-rev id. */
   value = apr_hash_get(headers, HEADER_ID, APR_HASH_KEY_STRING);
+ if (value == NULL)
+ return svn_error_create(SVN_ERR_FS_CORRUPT, NULL,
+ _("Found malformed header in "
+ "revision file"));
 
   SVN_ERR(svn_io_file_close(revision_file, pool));
 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-02-13 04:00:02 CET

This is an archived mail posted to the Subversion Dev mailing list.