Deterministic FSFS revision files

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Mon, 30 Jul 2012 17:47:20 +0100

When writing an FSFS revision file some parts are written in hash order
and so with a recent APR the order is not predictable. Thus loading the
same dumpfile into two separate empty repositories produces different
revision files. The things that vary are the order of the nodes in the
file and the order of the change lines at the end of the file.

As with the dumpfile format we have never formally specified the order
of a revision file so the randomn order is not strictly a bug but it
might be useful if the order was at least repeatable.

Things get more complicated in 1.8 as a side-effect of directory
deltification. Directory deltification works best if the directory
order is stable and so some hashes now use a non-APR hash function to
produce a stable order. Whether or not the revision file is repeatable
depends on which hashes are used. The change lines hash is still
unstable but the hash returned by svn_fs_fs__rep_contents_dir can be
stable or unstable depending on whether or not the has was found in the
cache or created on demand. It makes me even more uneasy that how much
variation is present in the revision file depends on our caching
strategy.

I'm considering changing the commit code so that hashes are written in a
stable order and the revision files are repeatable. Does anyone think
this would be a bad idea?

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c (revision 1367025)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -7367,16 +7367,19 @@ write_final_rev(const svn_fs_id_t **new_id_p,
     {
       apr_pool_t *subpool;
       apr_hash_t *entries, *str_entries;
- apr_hash_index_t *hi;
+ apr_array_header_t *sorted_entries;
+ int i;

       /* This is a directory. Write out all the children first. */
       subpool = svn_pool_create(pool);

       SVN_ERR(svn_fs_fs__rep_contents_dir(&entries, fs, noderev, pool));
-
- for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
+ sorted_entries = svn_sort__hash(entries, svn_sort_compare_items_lexically,
+ pool);
+ for (i = 0; i < sorted_entries->nelts; ++i)
         {
- svn_fs_dirent_t *dirent = svn__apr_hash_index_val(hi);
+ svn_fs_dirent_t *dirent = APR_ARRAY_IDX(sorted_entries, i,
+ svn_sort__item_t).value;

           svn_pool_clear(subpool);
           SVN_ERR(write_final_rev(&new_id, file, rev, fs, dirent->id,
@@ -7546,19 +7549,22 @@ write_final_changed_path_info(apr_off_t *offset_p,
{
   apr_hash_t *changed_paths;
   apr_off_t offset;
- apr_hash_index_t *hi;
   apr_pool_t *iterpool = svn_pool_create(pool);
   fs_fs_data_t *ffd = fs->fsap_data;
   svn_boolean_t include_node_kinds =
       ffd->format >= SVN_FS_FS__MIN_KIND_IN_CHANGED_FORMAT;
+ apr_array_header_t *sorted_changed_paths;
+ int i;

   SVN_ERR(get_file_offset(&offset, file, pool));

   SVN_ERR(svn_fs_fs__txn_changes_fetch(&changed_paths, fs, txn_id, pool));
+ sorted_changed_paths = svn_sort__hash(changed_paths,
+ svn_sort_compare_items_lexically, pool);

   /* Iterate through the changed paths one at a time, and convert the
      temporary node-id into a permanent one for each change entry. */
- for (hi = apr_hash_first(pool, changed_paths); hi; hi = apr_hash_next(hi))
+ for (i = 0; i < sorted_changed_paths->nelts; ++i)
     {
       node_revision_t *noderev;
       const svn_fs_id_t *id;
@@ -7567,8 +7573,8 @@ write_final_changed_path_info(apr_off_t *offset_p,

       svn_pool_clear(iterpool);

- change = svn__apr_hash_index_val(hi);
- path = svn__apr_hash_index_key(hi);
+ change = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).value;
+ path = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).key;

       id = change->node_rev_id;

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Received on 2012-07-30 18:47:59 CEST

This message: [ Message body ]
Next message: C. Michael Pilato: "Re: Deterministic FSFS revision files"
Previous message: Julian Foad: "Re: Heads-up: new typedef 'svn_merge_rangelist_t'"
Next in thread: C. Michael Pilato: "Re: Deterministic FSFS revision files"
Reply: C. Michael Pilato: "Re: Deterministic FSFS revision files"
Reply: Stefan Fuhrmann: "Re: Deterministic FSFS revision files"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]