Re: [PATCH] svnadmin hotcopy, hot-backup.py race conditions fix

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2003-09-09 03:19:19 CEST

Vladimir Berezniker <vmpn@tigris.org> writes:

> Here is the hot copy patch. The patch implements svnadmin hotcopy
> command with optional --archive-logs flag. If flag is specified,
> after the copying is complete, unused *copied* log files are deleted
> from the source repository.

I agree with Julian, "archive" is not a good term for this behaviour.

> * subversion/include/svn_error.h
> (SVN_ERR_POOL): Implemented macro to handle subversion errors in
> code that uses subpools.

Why is this necessary? If an error is returned then the place that
handles the error is responsible for clearing the pool, this should
take care of any subpools.

> Index: subversion/svnadmin/main.c
> ===================================================================
> --- subversion/svnadmin/main.c (revision 7005)
> +++ subversion/svnadmin/main.c (working copy)
> @@ -50,11 +50,43 @@
> return SVN_NO_ERROR;
> }
>
> +static svn_error_t *
> +parse_local_repos_path(apr_getopt_t *os, const char ** repos_path,
> + apr_pool_t *pool)

Static functions should have a documentation string describing the
function and the parameters.

> +{
> + *repos_path = NULL;
>
> + if (os->ind < os->argc)
> + {
> + const char * path = os->argv[os->ind++];
> + SVN_ERR (svn_utf_cstring_to_utf8 (repos_path,
> + path,
> + NULL, pool));
> + *repos_path = svn_path_internal_style (*repos_path, pool);
> + }
> +
> + if (*repos_path == NULL)
> + {
> + return svn_error_create (SVN_ERR_CL_ARG_PARSING_ERROR,
> + NULL,
> + "repository argument required\n");
> + }
> + else if (svn_path_is_url (*repos_path))
> + {
> + return svn_error_createf (SVN_ERR_CL_ARG_PARSING_ERROR,
> + NULL,
> + "'%s' is an url when it should be a path\n",
> + *repos_path);
> + }
> +
> + return SVN_NO_ERROR;
> +}
> +
>
> /** Subcommands. **/
>
> static svn_opt_subcommand_t
> + subcommand_hotcopy,
> subcommand_create,
> subcommand_createtxn,
> subcommand_dump,
> @@ -78,7 +110,8 @@
> svnadmin__force_uuid,
> svnadmin__parent_dir,
> svnadmin__bdb_txn_nosync,
> - svnadmin__config_dir
> + svnadmin__config_dir,
> + svnadmin__archive_logs
> };
>
> /* Option codes and descriptions.
> @@ -126,6 +159,9 @@
> {"config-dir", svnadmin__config_dir, 1,
> "read user configuration files from directory ARG"},
>
> + {"archive-logs", svnadmin__archive_logs, 0,
> + "delete copied, unused log files from the source repository."},
> +
> {NULL}
> };
>
> @@ -160,6 +196,11 @@
> "Display this usage message.\n",
> {svnadmin__version} },
>
> + {"hotcopy", subcommand_hotcopy, {0},
> + "usage: svnadmin hotcopy REPOS_PATH NEW_REPOS_PATH\n\n"
> + "Makes a hot copy of a repository.\n\n",
> + {svnadmin__archive_logs} },
> +
> {"load", subcommand_load, {0},
> "usage: svnadmin load REPOS_PATH\n\n"
> "Read a 'dumpfile'-formatted stream from stdin, committing\n"
> @@ -229,6 +270,7 @@
> struct svnadmin_opt_state
> {
> const char *repository_path;
> + const char *new_repository_path;

Odd whitespace, this appears in quite a few places.

> svn_opt_revision_t start_revision, end_revision; /* -r X[:Y] */
> svn_boolean_t help; /* --help or -? */
> svn_boolean_t version; /* --version */
> @@ -236,6 +278,7 @@
> svn_boolean_t follow_copies; /* --copies */
> svn_boolean_t quiet; /* --quiet */
> svn_boolean_t bdb_txn_nosync; /* --bdb-txn-nosync */
> + svn_boolean_t archive_logs; /* --archive-logs */

ditto.

> Index: subversion/libsvn_fs/fs.c
> ===================================================================
> --- subversion/libsvn_fs/fs.c (revision 7005)
> +++ subversion/libsvn_fs/fs.c (working copy)
> @@ -531,6 +531,121 @@
> return svn_err;
> }
>
> +/**
> + * Archive all unused log files that have been copied to a different location
> + */
> +svn_error_t *
> +svn_fs__archive_logs(const char *live_path,
> + const char *backup_path,
> + apr_pool_t *pool)
> +{
> +
> + apr_array_header_t *logfiles;
> +
> + SVN_ERR (svn_fs_berkeley_logfiles (&logfiles,
> + live_path,
> + TRUE, /* Only unused logs */
> + pool));
> +
> + if (logfiles == NULL) {
> + return SVN_NO_ERROR;
> + }
> +
> + { /* Process unused logs from live area */
> + int log;
> + apr_pool_t *sub_pool = svn_pool_create(pool);
> +
> + /* Process log files. */
> + for (log = 0; log < logfiles->nelts; svn_pool_clear(sub_pool), log++)

I don't like svn_pool_clear in the for(), I'd prefer a separate line
as per the HACKING file.

> + {

The { should be indented, this comment applies in quite a few places.

> + const char * log_file = APR_ARRAY_IDX (logfiles, log, const char *);
> + const char *live_log_path
> + = svn_path_join (live_path, log_file, sub_pool);
> + const char *backup_log_path
> + = svn_path_join (backup_path, log_file, sub_pool);

Does this assume that the log files are always located directly within
the db/ directory? BDB has set_lg_dir that allows a DB admin to
change that.

> +
> + { /* Compare files. No point in using MD5 and waisting CPU cycles as we
> + got full copies of both logs */
> + svn_boolean_t files_match = FALSE;
> + svn_error_t * err = svn_io_files_contents_same_p(&files_match,
> + live_log_path,
> + backup_log_path,
> + sub_pool);
> + if (err) {
> + svn_error_clear(err);
> + files_match = FALSE;

I'd like a bit of explanation here. Why is it acceptable to ignore
errors?

> + }
> +
> +
> + if(files_match == FALSE) {
> + continue;
> + }
> + }
> +
> + SVN_ERR_POOL(svn_io_remove_file(live_log_path, sub_pool), sub_pool);
> + }
> +
> + svn_pool_destroy(sub_pool);
> + }
> +
> + return SVN_NO_ERROR;
> +}
> +
> +svn_error_t *
> +svn_fs_hotcopy_berkeley (const char *src_path,
> + const char *dest_path,
> + svn_boolean_t archive_logs,
> + apr_pool_t *pool)
> +{
> + /* Check DBD version, just in case */
> + SVN_ERR (check_bdb_version (pool));
> +
> + /* Copy the DB_CONFIG file. */
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"DB_CONFIG", pool));
> +
> + /* Copy the databases. */
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"nodes", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"revisions", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"transactions", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"copies", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"changes", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"representations", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"strings", pool));
> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path, &"uuids", pool));

This appears to assume that the data files are located directly in the
db/ dir. BDB has set_data_dir that allows the DB admin to change the
location.

> +
> + {
> + apr_array_header_t *logfiles;
> + int log;
> +
> + SVN_ERR (svn_fs_berkeley_logfiles (&logfiles,
> + src_path,
> + FALSE, /* All logs */
> + pool));
> +
> + if (logfiles == NULL) {
> + return SVN_NO_ERROR;
> + }
> +
> + /* Process log files. */
> + for (log = 0; log < logfiles->nelts; log++)
> + {

The { should be indented.

> + SVN_ERR(svn_io_dir_file_copy(src_path, dest_path,
> + APR_ARRAY_IDX (logfiles, log, const char *),
> + pool));
> + }
> + }
> +
> + /* Since this is a copy we will have exclusive access to the repository. */
> + SVN_ERR(svn_fs_berkeley_recover(dest_path, pool));
> +
> + if(archive_logs == TRUE)
> + {

ditto

> + SVN_ERR(svn_fs__archive_logs(src_path, dest_path, pool));
> + }
> +
> + return SVN_NO_ERROR;
> +}
> +
>
> /* Gaining access to an existing Berkeley DB-based filesystem. */
>
> Index: subversion/libsvn_subr/io.c
> ===================================================================
> --- subversion/libsvn_subr/io.c (revision 7005)
> +++ subversion/libsvn_subr/io.c (working copy)
> @@ -182,7 +182,7 @@
>
>
>
> -/*** Copying and appending files. ***/
> +/*** Creating, copying and appending files. ***/
>
> svn_error_t *
> svn_io_copy_file (const char *src,
> @@ -463,7 +463,45 @@
> #endif
> }
>
> +svn_error_t *svn_io_file_create (const char *file,
> + const char *contents,
> + apr_pool_t *pool)
> +{
> + apr_status_t apr_err;
> + apr_file_t *f = NULL;

Why is f initialised?

> + apr_size_t written;
>
> + SVN_ERR (svn_io_file_open (&f, file,
> + (APR_WRITE | APR_CREATE | APR_EXCL),
> + APR_OS_DEFAULT,
> + pool));
> +
> + apr_err = apr_file_write_full (f, contents, strlen (contents), &written);
> + if (apr_err)
> + return svn_error_createf
> + (apr_err, NULL, "svn_io_file_create: error writing '%s'", file);
> +
> + apr_err = apr_file_close (f);
> + if (apr_err)
> + return svn_error_createf
> + (apr_err, NULL,
> + "svn_io_file_create: error closing '%s'", file);
> + return SVN_NO_ERROR;
> +}
> +
> +svn_error_t *svn_io_dir_file_copy (const char *src_path,
> + const char *dest_path,
> + const char *file,
> + apr_pool_t *pool)
> +{
> + const char *file_dest_path = svn_path_join (dest_path, file, pool);
> + const char *file_src_path = svn_path_join (src_path, file, pool);
> +
> + SVN_ERR (svn_io_copy_file(file_src_path, file_dest_path, TRUE, pool));

For consistency it would be better to choose one of

foo (args);
foo(args);

and not mix the two styles. Try to follow the style of the file being
modified.

> +
> + return SVN_NO_ERROR;
> +}
> +
>
> /*** Modtime checking. ***/
>
> @@ -730,7 +768,74 @@
> return SVN_NO_ERROR;
> }
>
> +
> +/*** File locking. ***/
> +/* Clear all outstanding locks on ARG, an open apr_file_t *. */
> +static apr_status_t
> +svn_io__file_clear_and_close (void *arg)
> +{
> + apr_status_t apr_err;
> + apr_file_t *f = arg;
>
> + /* Remove locks. */
> + apr_err = apr_file_unlock (f);
> + if (apr_err)
> + return apr_err;
> +
> + /* Close the file. */
> + apr_err = apr_file_close (f);
> + if (apr_err)
> + return apr_err;
> +
> + return 0;
> +}
> +
> +
> +svn_error_t *svn_io_file_lock (const char *lock_file,
> + svn_boolean_t exclusive,
> + apr_pool_t *pool)
> +{
> + int locktype = APR_FLOCK_SHARED;
> + apr_file_t *lockfile_handle;
> + apr_int32_t flags;
> + apr_status_t apr_err;
> +
> + if(exclusive == TRUE) {
> + locktype = APR_FLOCK_EXCLUSIVE;
> + }
> +
> + flags = APR_READ;
> + if (locktype == APR_FLOCK_EXCLUSIVE)
> + flags |= APR_WRITE;

I find it mildly distracting that you switch randomly between

   if (foo) {
     bar;
   }

and

if (foo)
bar;

I prefer the latter.

> +
> + SVN_ERR (svn_io_file_open (&lockfile_handle, lock_file, flags,
> + APR_OS_DEFAULT,
> + pool));
> +
> + /* Get lock on the filehandle. */
> + apr_err = apr_file_lock (lockfile_handle, locktype);
> + if (apr_err)
> + {

The { should be indented.

> + const char *lockname = "unknown";
> + if (locktype == APR_FLOCK_SHARED)
> + lockname = "shared";
> + if (locktype == APR_FLOCK_EXCLUSIVE)
> + lockname = "exclusive";
> +
> + return svn_error_createf
> + (apr_err, NULL,
> + "svn_io_file_lock: %s lock on file `%s' failed",
> + lockname, lock_file);
> + }
> +
> + apr_pool_cleanup_register (pool, lockfile_handle,
> + svn_io__file_clear_and_close,
> + apr_pool_cleanup_null);
> +
> + return SVN_NO_ERROR;
> +}
> +
> +
>
> /* TODO write test for these two functions, then refactor. */
>
> Index: subversion/libsvn_repos/repos.c
> ===================================================================
> --- subversion/libsvn_repos/repos.c (revision 7005)
> +++ subversion/libsvn_repos/repos.c (working copy)
> +
> +struct hotcopy_ctx_t {
> + const char *dest; /* target location to construct */
> + unsigned int src_len; /* len of the source path*/
> +};
> +
> +
> +static svn_error_t *hotcopy_structure (void *baton,
> + const char *path,
> + const apr_finfo_t *finfo,
> + apr_pool_t *pool)

Could use a documentation comment for this function.

> +{
> + const struct hotcopy_ctx_t *ctx = ((struct hotcopy_ctx_t *) baton);
> + const char *sub_path;
> + const char *target;
> + svn_boolean_t fs_dir = FALSE;
> +
> + if(strlen(path) == ctx->src_len)
> + {
> + sub_path = &"";

Huh? Do you mean sub_path = ""? (I have to guess since path isn't
documented ;)

> + }
> + else
> + {
> + sub_path = &path[ctx->src_len+1];
> +
> + /* Check if we are inside db directory and if so skip it */
> + if(svn_path_compare_paths(
> + svn_path_get_longest_ancestor(SVN_REPOS__DB_DIR, sub_path, pool),
> + SVN_REPOS__DB_DIR) == 0) {
> + return SVN_NO_ERROR;
> + }
> +
> +
> + if(svn_path_compare_paths(
> + svn_path_get_longest_ancestor(SVN_REPOS__LOCK_DIR, sub_path, pool),
> + SVN_REPOS__LOCK_DIR) == 0) {
> + return SVN_NO_ERROR;
> + }
> +
> + }
> +
> + target = svn_path_join(ctx->dest, sub_path, pool);
> +
> + if (finfo->filetype == APR_DIR)
> + {
> + SVN_ERR (create_repos_dir (target, pool));
> + }
> + else if(finfo->filetype == APR_REG)
> + {
> +
> + SVN_ERR(svn_io_copy_file(path, target, TRUE, pool));
> + }
> +
> + return SVN_NO_ERROR;
> +}
> +
> +
> +
> +static svn_error_t *
> +lock_db_logs_file (svn_repos_t *repos,
> + svn_boolean_t exclusive,
> + apr_pool_t *pool)
> +{
> + const char * lock_file = svn_repos_db_logs_lockfile(repos, pool);
> + svn_node_kind_t kind;
> + SVN_ERR(svn_io_check_path(lock_file, &kind, pool));
> +
> + /* Try to create a lock file if it is missing.
> + Ignore creation errors just in case file got created by another process
> + while we were checking. */
> + if(kind == svn_node_none) {
> + svn_error_t * err = create_db_logs_lock(repos, pool);
> + if(err) {
> + svn_error_clear(err);
> + }

To ignore an error unconditionally use

svn_error_clear (create_db_logs_lock (...));

There is a race between svn_io_check_path and create_db_logs_lock.
Since you are ignoring errors, why bother with svn_io_check_path at
all?

> + }
> +
> + SVN_ERR(svn_io_file_lock(lock_file, exclusive, pool));
> +
> + return SVN_NO_ERROR;
> +}
> +
> +
> +/* Make a copy of a repository with hot backup of fs. */
> +svn_error_t *
> +svn_repos_hotcopy (const char *src_path,
> + const char *dst_path,
> + svn_boolean_t archive_logs,
> + apr_pool_t *pool)
> +{
> + svn_repos_t *src_repos = NULL;
> + svn_repos_t *dst_repos = NULL;

Do these need to be initialised?

> + struct hotcopy_ctx_t hotcopy_context;
> +
> + /* Try to open original repository */
> + SVN_ERR (get_repos (&src_repos, src_path,
> + APR_FLOCK_SHARED,
> + FALSE, /* don't try to open the db yet. */
> + pool));
> +
> + /* If we are going to archive logs, then get an exclusive lock on
> + db-logs.lock, to ensure that no one else will work with logs.
> +
> + If we are just copying, then get a shared lock to ensure that
> + no one else will archive logs while we copying them */
> +
> + SVN_ERR (lock_db_logs_file(src_repos, archive_logs, pool));
> +
> + /* Copy the repository to a new path, with exception of
> + specially handled directories */
> +
> + hotcopy_context.dest = dst_path;
> + hotcopy_context.src_len = strlen(src_path);
> + SVN_ERR (svn_io_dir_walk (src_path,
> + 0,
> + hotcopy_structure,
> + &hotcopy_context,
> + pool));
> +
> + /* Prepare dst_repos object so that we may create locks,
> + so that we may open repository */
> +
> + /* Allocate a repository object. */
> + dst_repos = apr_pcalloc (pool, sizeof (*dst_repos));

Comment is redundant.

> +
> + /* Initialize the repository paths. */
> + init_repos_dirs (dst_repos, dst_path, pool);

Comment is redundant.

> +
> + /* Create the lock directory. */
> + SVN_ERR (create_locks (dst_repos, pool));

Comment is redundant.

> +
> + SVN_ERR (create_repos_dir(dst_repos->db_path, pool));
> + /* Reopen repository the right way. Since above is just work around
> + util we can create locks without an open repository */
> +
> + /* Exclusevly lock the new repository.

Typo: "Exclusively".

> + No one should be accessing it at the moment */
> + SVN_ERR (get_repos (&dst_repos, dst_path,
> + APR_FLOCK_EXCLUSIVE,
> + FALSE, /* don't try to open the db yet. */
> + pool));
> +
> +
> + SVN_ERR ( svn_fs_hotcopy_berkeley(src_repos->db_path, dst_repos->db_path,
> + archive_logs, pool));

More odd whitespace. I don't think we use "SVN_ERR ( foo" anywhere in
the code.

> +
> + return SVN_NO_ERROR;
> +}
> +

Overall, this patch looks promising.

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Tue Sep 9 03:20:41 2003

This message: [ Message body ]
Next message: Philip Martin: "Re: [PATCH] Change head/tail call in build/buildcheck.sh"
Previous message: Art Haas: "[PATCH] Change head/tail call in build/buildcheck.sh"
In reply to: Vladimir Berezniker: "[PATCH] svnadmin hotcopy, hot-backup.py race conditions fix"
Next in thread: Vladimir Berezniker: "Re: [PATCH] svnadmin hotcopy, hot-backup.py race conditions fix"
Reply: Vladimir Berezniker: "Re: [PATCH] svnadmin hotcopy, hot-backup.py race conditions fix"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]