Since Branko has a working delta combiner now, I decided to see how
skip-deltas would affect performance on the SVN repository. (My
theory was that the ability to paste together small deltas quickly
before applying them is nice, but having many fewer of them to paste
together for a heavily modified file is even nicer.)
Unfortunately, skip-deltas don't seem very practical at the moment
because it takes too long to walk the predecessor list of a node
revision. If I could get a quick count of the number of predecessors
of a node-revision, then I think I could produce something practical.
Perhaps I can figure out how to store that information in the
node-revision information in the database.
Repeating Branko's tests, I got (all times in seconds, all
tests run against the repository created by that version of the code
base using a dump of rev 2656 of the SVN repository):
Trunk Iss531 Skip1 Skip2 Skip3
----- ------ ----- ----- -----
svnadmin load 853 949 1492 1096 1441
svn co -r500 44 42 38 37 41
svn up -r2500 321 345 328 355 337
svn up -r1500 120 110 108 115 128
Repos db size 55324K 38712K 38744K 51048K 38740K
where Trunk is revision 2714 of the trunk, Iss531 is revision 2714 of
branches/issue-531-dev, and Skip1, Skip2, and Skip3 are various tweaks
on the skip-deltas concept; specifically:
Skip1: Straight deterministic skip-deltas. If 2^N divides the
number of predecessors, we redeltify the 2^Nth predecessor so that
it is represnted as a delta against the head.
Skip2: Non-deterministic skip-deltas. We randomly determine the
number of predecessors to redeltify (always redeltify the immediate
parent; 50% of the time redeltify the one two back, 25% of the time
redeltify the one four back, etc.). This saves us from doing a full
walk of the predecessor list for each new node-revision. Using
random numbers in libsvn_fs is kind of icky from a code maintenance
standpoint, though.
Skip3: Deterministic skip-deltas with a minimum threshold. If the
number of predecessors is less than 32 (semi-arbitrary number), we
don't deltify more than the immediate parent. This is an attempt to
avoid the space and re-deltification time penalty of skip deltas
when the benefit would be small. Unfortunately, it doesn't seem to
help much.
Observations:
* In my tests, none of the implementations yields particularly
better checkout or update times than the trunk. (Branko reported
a factor of two gain from his delta combiner on checking out
revision 500, but I didn't see that. Not sure why.)
* Deterministic skip-deltas don't seem to impose much of a space
penalty in practice. Non-deterministic ones do, though.
* Skip-deltas impose a significant penalty on checkin time (a big
increase in the time spent in user space, if you go see my raw
data at the end). Most of this penalty appears to come from
having to do a complete walk of the predecessor list for each new
node-revision.
I've included my implementation of "skip3" for the record. Since I
don't plan for it to be checked in, no log message. After the patch,
I've included the raw data for my table above.
Index: ./tree.c
===================================================================
--- ./tree.c
+++ ./tree.c Fri Jul 26 01:39:47 2002
@@ -1129,6 +1129,75 @@
};
+static svn_error_t *
+collect_predecessors (void *baton, dag_node_t *this_node, int *done,
+ trail_t *trail)
+{
+ apr_array_header_t *predecessors = baton;
+ dag_node_t **slot;
+
+ if (this_node)
+ {
+ slot = apr_array_push (predecessors);
+ *slot = this_node;
+ }
+
+ return SVN_NO_ERROR;
+}
+
+
+/* Redeltify predecessor node-revisions of the one we added. The idea
+ is to require at most 2*lg(N) deltas to be applied to get to any
+ node-revision in a chain of N predecessors. We do this using a
+ technique derived from skip lists:
+
+ 100% of the time we redeltify our parent
+ 50% of the time we also redeltify two predecessors back
+ 25% of the time we also redeltify four predecessors back
+ etc.
+
+ This implementation is deterministic, which guarantees the 2*lg(N)
+ upper bound but requires scanning the entire predecessor chain for
+ each commit to count the number of predecessors. We could,
+ alternatively, choose nondeterministically and avoid the count. */
+static svn_error_t *
+txn_skip_deltify (dag_node_t *node, int props_only, trail_t *trail)
+{
+ apr_array_header_t *predecessors;
+ int stop, i;
+ dag_node_t *prednode;
+
+ predecessors = apr_array_make (trail->pool, 0, sizeof(dag_node_t *));
+ SVN_ERR (svn_fs__dag_walk_predecessors (node, collect_predecessors,
+ predecessors, trail));
+
+ /* Find the first power of two which doesn't evenly divide the
+ number of predecessors. This is the marker we will stop
+ just prior to. */
+ for (stop = 2; predecessors->nelts % stop == 0; stop *= 2)
+ ;
+
+ /* Tradeoff: until the number of predecessors is at least 32, don't
+ do skip deltas. Otherwise we get the cost of skip-deltas up
+ front, whereas the scaling benefits don't show up until the
+ number of predecessors is large. */
+ if (predecessors->nelts < 32)
+ stop = 2;
+
+ /* Now deltify the predecessors one back, two back, four back, and
+ so on, until we hit stop. */
+ for (i = 1; i < stop; i *= 2)
+ {
+ /* elts[0] is the first predecessor back, so we have to subtract
+ one from the offset. */
+ prednode = ((dag_node_t **) predecessors->elts)[i - 1];
+ SVN_ERR (svn_fs__dag_deltify (prednode, node, props_only, trail));
+ }
+
+ return SVN_NO_ERROR;
+}
+
+
/* Deltify ID's predecessor iff ID is mutable under TXN_ID in FS. If
ID is a mutable directory, recurse. Do this as part of TRAIL. */
static svn_error_t *
@@ -1141,7 +1210,6 @@
dag_node_t *node;
apr_hash_t *entries;
int is_dir;
- const svn_fs_id_t *pred_id;
/* Not mutable? Go no further. This is safe to do because for
items in the tree to be mutable, their parent dirs must also be
@@ -1180,14 +1248,8 @@
}
}
- /* If we have have a predecessor, deltify it against the current
- node revision. */
- if ((pred_id = noderev->predecessor_id))
- {
- dag_node_t *pred_node;
- SVN_ERR (svn_fs__dag_get_node (&pred_node, fs, pred_id, trail));
- SVN_ERR (svn_fs__dag_deltify (pred_node, node, is_dir, trail));
- }
+ if (noderev->predecessor_id)
+ SVN_ERR (txn_skip_deltify (node, is_dir, trail));
return SVN_NO_ERROR;
}
Trunk
-----
time /var/ghudson/svn/subversion/svnadmin/svnadmin load /var/ghudson/svn-test < /var/ghudson/svn-repos-2656.dumpfile
108.710u 20.020s 14:13.29 15.0% 0+0k 0+0io 453pf+0w
time /var/ghudson/svn/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svn-test /var/ghudson/svn-test-wc
12.880u 2.550s 0:43.69 35.3% 0+0k 0+0io 568pf+0w
time /var/ghudson/svn/subversion/clients/cmdline/svn up -r2500
119.560u 18.710s 5:21.21 43.0% 0+0k 0+0io 582pf+0w
time /var/ghudson/svn/subversion/clients/cmdline/svn up -r1500
23.200u 4.560s 1:59.79 23.1% 0+0k 0+0io 581pf+0w
total 55324
Delta combiner branch, without skip-deltas
------------------------------------------
time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc2-test < /var/ghudson/svn-repos-2656.dumpfile
112.420u 22.340s 15:48.58 14.2% 0+0k 0+0io 478pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc2-test /var/ghudson/svnc2-test-wc
13.420u 2.450s 0:42.49 37.3% 0+0k 0+0io 568pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
122.450u 19.620s 5:44.68 41.2% 0+0k 0+0io 580pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.940u 4.160s 1:50.00 25.5% 0+0k 0+0io 584pf+0w
total 38712
Delta combiner branch, with skip-deltas
---------------------------------------
time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc-test < /var/ghudson/svn-repos-2656.dumpfile
722.380u 32.530s 24:52.44 50.5% 0+0k 0+0io 491pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc-test /var/ghudson/svnc-test-wc
13.290u 2.360s 0:38.11 41.0% 0+0k 0+0io 581pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
119.040u 18.220s 5:27.78 41.8% 0+0k 0+0io 593pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.370u 4.160s 1:48.47 25.3% 0+0k 0+0io 596pf+0w
total 38744
Delta combiner branch, with non-deterministic skip-deltas
---------------------------------------------------------
time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc3-test < /var/ghudson/svn-repos-2656.dumpfile
187.790u 23.870s 18:15.69 19.3% 0+0k 0+0io 484pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc3-test /var/ghudson/svnc3-test-wc
12.580u 2.280s 0:37.09 40.0% 0+0k 0+0io 575pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
118.930u 18.970s 5:55.42 38.7% 0+0k 0+0io 586pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
22.760u 4.360s 1:55.07 23.5% 0+0k 0+0io 586pf+0w
total 51048
Delta combiner branch, with deterministic limited skip-deltas
-------------------------------------------------------------
time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc4-test < /var/ghudson/svn-repos-2656.dumpfile
691.360u 30.850s 24:01.12 50.1% 0+0k 0+0io 483pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc4-test /var/ghudson/svnc4-test-wc
13.740u 2.350s 0:41.14 39.1% 0+0k 0+0io 573pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
119.800u 19.490s 5:37.15 41.3% 0+0k 0+0io 584pf+0w
time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.250u 4.630s 2:07.66 21.8% 0+0k 0+0io 588pf+0w
total 38740
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jul 26 08:35:17 2002