[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Deterministic FSFS revision files

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Mon, 30 Jul 2012 17:47:20 +0100

When writing an FSFS revision file some parts are written in hash order
and so with a recent APR the order is not predictable. Thus loading the
same dumpfile into two separate empty repositories produces different
revision files. The things that vary are the order of the nodes in the
file and the order of the change lines at the end of the file.

As with the dumpfile format we have never formally specified the order
of a revision file so the randomn order is not strictly a bug but it
might be useful if the order was at least repeatable.

Things get more complicated in 1.8 as a side-effect of directory
deltification. Directory deltification works best if the directory
order is stable and so some hashes now use a non-APR hash function to
produce a stable order. Whether or not the revision file is repeatable
depends on which hashes are used. The change lines hash is still
unstable but the hash returned by svn_fs_fs__rep_contents_dir can be
stable or unstable depending on whether or not the has was found in the
cache or created on demand. It makes me even more uneasy that how much
variation is present in the revision file depends on our caching

I'm considering changing the commit code so that hashes are written in a
stable order and the revision files are repeatable. Does anyone think
this would be a bad idea?

Index: subversion/libsvn_fs_fs/fs_fs.c
--- subversion/libsvn_fs_fs/fs_fs.c (revision 1367025)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -7367,16 +7367,19 @@ write_final_rev(const svn_fs_id_t **new_id_p,
       apr_pool_t *subpool;
       apr_hash_t *entries, *str_entries;
- apr_hash_index_t *hi;
+ apr_array_header_t *sorted_entries;
+ int i;
       /* This is a directory. Write out all the children first. */
       subpool = svn_pool_create(pool);
       SVN_ERR(svn_fs_fs__rep_contents_dir(&entries, fs, noderev, pool));
- for (hi = apr_hash_first(pool, entries); hi; hi = apr_hash_next(hi))
+ sorted_entries = svn_sort__hash(entries, svn_sort_compare_items_lexically,
+ pool);
+ for (i = 0; i < sorted_entries->nelts; ++i)
- svn_fs_dirent_t *dirent = svn__apr_hash_index_val(hi);
+ svn_fs_dirent_t *dirent = APR_ARRAY_IDX(sorted_entries, i,
+ svn_sort__item_t).value;
           SVN_ERR(write_final_rev(&new_id, file, rev, fs, dirent->id,
@@ -7546,19 +7549,22 @@ write_final_changed_path_info(apr_off_t *offset_p,
   apr_hash_t *changed_paths;
   apr_off_t offset;
- apr_hash_index_t *hi;
   apr_pool_t *iterpool = svn_pool_create(pool);
   fs_fs_data_t *ffd = fs->fsap_data;
   svn_boolean_t include_node_kinds =
       ffd->format >= SVN_FS_FS__MIN_KIND_IN_CHANGED_FORMAT;
+ apr_array_header_t *sorted_changed_paths;
+ int i;
   SVN_ERR(get_file_offset(&offset, file, pool));
   SVN_ERR(svn_fs_fs__txn_changes_fetch(&changed_paths, fs, txn_id, pool));
+ sorted_changed_paths = svn_sort__hash(changed_paths,
+ svn_sort_compare_items_lexically, pool);
   /* Iterate through the changed paths one at a time, and convert the
      temporary node-id into a permanent one for each change entry. */
- for (hi = apr_hash_first(pool, changed_paths); hi; hi = apr_hash_next(hi))
+ for (i = 0; i < sorted_changed_paths->nelts; ++i)
       node_revision_t *noderev;
       const svn_fs_id_t *id;
@@ -7567,8 +7573,8 @@ write_final_changed_path_info(apr_off_t *offset_p,
- change = svn__apr_hash_index_val(hi);
- path = svn__apr_hash_index_key(hi);
+ change = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).value;
+ path = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).key;
       id = change->node_rev_id;

Certified & Supported Apache Subversion Downloads:
Received on 2012-07-30 18:47:59 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.