[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1555109 - /subversion/trunk/subversion/libsvn_fs_fs/pack.c

From: Evgeny Kotkov <evgeny.kotkov_at_visualsvn.com>
Date: Thu, 27 Feb 2014 04:53:43 +0400

Hi Stefan,

> Author: stefan2
> Date: Fri Jan 3 14:34:41 2014
> New Revision: 1555109
> URL: http://svn.apache.org/r1555109
> Log:
> Refine the FSFS f7 pack heuristics away from the container-centric
> order used by FSX. The changes only affect the order of text reps
> and noderevs; changed paths lists and properties remain as are.

It looks like there is a problem is the implementation of the reps sorting
algorithm. The sort_reps_range() function (which essentially is the core of
the new heuristics) can access uninitialized memory under certain circumstances.
There is a bunch of valgrind warnings for trunk_at_1572250 ...
  (valgrind fs-fs-pack-test)

  Conditional jump or move depends on uninitialised value(s)
     at 0x42C9BF: svn_fs_fs__id_part_eq (id.c:145)
     by 0x43505B: sort_reps_range (pack.c:932)
     by 0x4340CD: sort_reps (pack.c:1007)
     by 0x432B38: pack_range (pack.c:1391)
     by 0x4319BF: pack_log_addressed (pack.c:1570)
     by 0x43151B: pack_rev_shard (pack.c:1758)
     by 0x431023: pack_shard (pack.c:1817)
     by 0x430E33: pack_body (pack.c:1958)
     by 0x425DAF: with_some_lock_file (fs_fs.c:185)
     by 0x425C21: svn_fs_fs__with_write_lock (fs_fs.c:203)
     by 0x430B15: svn_fs_fs__pack (pack.c:1986)
     by 0x425491: fs_pack (fs.c:364)

...and this behavior can also be witnessed in some real-world scenarios.
I patched this function to make it crash upon the uninitialized variable usage
and received 11/14 coredumps in the fs-fs-pack-test and a 100%-reproducible
coredump when packing the svnrdump'ed http://googletest.googlecode.com/svn/
repository with "layout sharded 5":
Index: subversion/libsvn_fs_fs/pack.c
--- subversion/libsvn_fs_fs/pack.c (revision 1572250)
+++ subversion/libsvn_fs_fs/pack.c (working copy)
@@ -849,6 +849,7 @@ sort_reps_range(pack_context_t *context,
   const svn_prefix_string__t *path;
   int i, dest, best;
   svn_fs_fs__id_part_t rep_id;
+ svn_boolean_t rep_id_initialized = FALSE;
   fs_fs_data_t *ffd = context->fs->fsap_data;

   /* The logic below would fail for empty ranges. */
@@ -885,6 +886,7 @@ sort_reps_range(pack_context_t *context,
           /* next path */
           path = path_order[i]->path;
           rep_id = path_order[i]->rep_id;
+ rep_id_initialized = TRUE;

           /* Pick roundest non-linear deltified node. */
           if (roundness(path_order[best]->predecessor_count)
@@ -926,8 +928,11 @@ sort_reps_range(pack_context_t *context,
             path = path_order[i]->path;
             rep_id = path_order[i]->rep_id;
+ rep_id_initialized = TRUE;

+ SVN_ERR_ASSERT_NO_RETURN(rep_id_initialized);
         /* Pick nodes along the deltification chain. Skip side-branches. */
         if (svn_fs_fs__id_part_eq(&path_order[i]->rep_id, &rep_id))

The algorithm reorders the noderevs in three scans (please correct me if I
misunderstood some details). The first scan (1) bubbles up a single noderev
from every cluster, which is likely to be referenced from the following packs.
Cluster is a term that refers to a sequential noderev range, where every
element shares the same path. After that, the second scan (2) bubbles up
reconstructible delta chains per every cluster. The last scan (3) is the
least interesting one, because it simply grabs everything what was left after
(1) and (2).

Now, in order to distinguish different clusters, parts (1) and (2) remember
(like, "lock on") the (PATH, REP_ID) for the first noderev from the current
cluster. Iff the PATH changes (this means we have just entered the next
cluster), we update the stored (PATH, REP_ID) values. We need to remember the
REP_ID in order to reconstruct those delta chains in (2), because they start
from this REP_ID.

As far as I can tell, the problem lies in the way we handle the first path.
For any non-empty path set, the first path always marks the beginning of the
corresponding cluster. However, we only remember the PATH part of the (PATH,
REP_ID) pair for the first path and omit the REP_ID. In case when the path set
consists of *a single cluster* (i.e. all path_order_t objects share the same
PATH), phase (2) will end up in an attempt to reconstruct the delta chain
starting from the uninitialized REP_ID. There are scenarios that lead to this
behavior in fs-fs-pack-test (see create_packed_filesystem() with multiple /iota
content tweaks falling into the same cluster) and this can easily happen with
real-world repositories.

So, what do you think about fixing it this way?
Index: subversion/libsvn_fs_fs/pack.c
--- subversion/libsvn_fs_fs/pack.c (revision 1572250)
+++ subversion/libsvn_fs_fs/pack.c (working copy)
@@ -865,6 +865,7 @@ sort_reps_range(pack_context_t *context,
   dest = first;
   path = path_order[first]->path;
+ rep_id = path_order[first]->rep_id;
   best = first;

   /* (1) For each path, pick the "roundest" representation and put it in

Just to mention, even if this change looks good, I won't be able to commit it,
because I am still waiting for my Apache account to be created. The same is
true for those patches from http://svn.haxx.se/dev/archive-2014-02/0280.shtml
and http://svn.haxx.se/dev/archive-2014-02/0279.shtml
Hopefully, this process won't take too long.

Evgeny Kotkov
Received on 2014-02-27 01:54:35 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.