[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r1224836 - in /subversion/trunk: notes/knobs subversion/libsvn_fs_fs/fs_fs.c

From: Stefan Fuhrmann <eqfox_at_web.de>
Date: Sat, 07 Apr 2012 20:21:44 +0200

Daniel Shahaf wrote:
> stefan2_at_apache.org wrote on Mon, Dec 26, 2011 at 23:37:25 -0000:
>> Author: stefan2
>> Date: Mon Dec 26 23:37:25 2011
>> New Revision: 1224836
>>
>> URL: http://svn.apache.org/viewvc?rev=1224836&view=rev
>> Log:
>> Tune FSFS deltification stratey: Use linear deltification on the very top of the
>> deltification history and skip-delta only for larger distances. Most of the runtime
>> overhead is being counterbalanced by reduced I/O and masked by our membuffer
>> caching.
>> OTOH, we examine skip delta ranges linearly during commit. Therefore, runtime
>> becomes an issue when committing nodes with very deep change histories.
>> This patch simply limits the deltification history to some reasonable length.
>>
>> * subversion/libsvn_fs_fs/fs_fs.c
>> (SVN_FS_FS_MAX_LINEAR_DELTIFICATION,
>> SVN_FS_FS_MAX_DELTIFICATION_WALK): new tuning parameters
>> (choose_delta_base): implement the new deltification strategy
>> * notes/knobs
>> document the new defines
>>
>> Modified:
>> subversion/trunk/notes/knobs
>> subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
>>
>> Modified: subversion/trunk/notes/knobs
>> URL: http://svn.apache.org/viewvc/subversion/trunk/notes/knobs?rev=1224836&r1=1224835&r2=1224836&view=diff
>> ==============================================================================
>> --- subversion/trunk/notes/knobs (original)
>> +++ subversion/trunk/notes/knobs Mon Dec 26 23:37:25 2011
>> +3.6 SVN_FS_FS_MAX_LINEAR_DELTIFICATION
>> +
>> + Scope: libsvn_fs_fs
>> + Purpose: max length + 1 of the linear deltification history
>> + before skip-deltification kicks in
>> + Default: 16
>> + Range: natural integers
>> + Suggested: 2, 4, 8, 16, 32, 64
>> +
>> +3.7 SVN_FS_FS_MAX_DELTIFICATION_WALK
>> +
>> + Scope: libsvn_fs_fs
>> + Purpose: max skip deltification range. Change histories
>> + longer than that will be restarted with a fulltext.
>> + Default: 1023
>> + Range: natural integers
>> + Suggested: 1, 2, 3, 4, 5, 7, 11
>> +
> The suggested values (and Range) for both of these are inconsistent with
> the comments at the definitions (which want a power of two, or a power
> of two minus one).

r1310734 should fix that.
>> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1224836&r1=1224835&r2=1224836&view=diff
>> ==============================================================================
>> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
>> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Mon Dec 26 23:37:25 2011
>> @@ -78,6 +78,20 @@
>> #define SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR 1000
>> #endif
>>
>> +/* Begin deltification after a node history exceeded this this limit.
>> + Useful values are 4 to 64 with 16 being a good compromise between
>> + computational overhead and pository size savings.
>> + Should be a power of 2.
>> + Values< 2 will result in standard skip-delta behavior. */
>> +#define SVN_FS_FS_MAX_LINEAR_DELTIFICATION 16
>> +
>> +/* Finding a deltification base takes operations proportional to the
>> + number of changes being skipped. To prevent exploding runtime
>> + during commits, limit the deltification range to this value.
>> + Should be a power of 2 minus one.
>> + Values< 1 disable deltification. */
>> +#define SVN_FS_FS_MAX_DELTIFICATION_WALK 1023
>> +
>> /* Following are defines that specify the textual elements of the
>> native filesystem directories and revision files. */
>>
>> @@ -5186,6 +5200,7 @@ choose_delta_base(representation_t **rep
>> apr_pool_t *pool)
>> {
>> int count;
>> + int walk;
>> node_revision_t *base;
>>
>> /* If we have no predecessors, then use the empty stream as a
>> @@ -5203,6 +5218,23 @@ choose_delta_base(representation_t **rep
>> count = noderev->predecessor_count;
>> count = count& (count - 1);
>>
>> + /* We use skip delta for limiting the number of delta operations
>> + along very long node histories. Close to HEAD however, we create
>> + a linear history to minimize delta size. */
>> + walk = noderev->predecessor_count - count;
>> + if (walk< SVN_FS_FS_MAX_LINEAR_DELTIFICATION)
>> + count = noderev->predecessor_count - 1;
>> +
> So, before this change the number of delta steps of the Nth revision was
> equal the number of 1-bits in the binary representation of
> predecessor_count, with this change it's equal to (predecessor_count
> & 15) + (the number of 1-bits in the binary representation of
> (predecessor_count>> 4)), right?

Yes. With the default values, you get an average of 5.5
extra steps. For very short histories (<<16 changes),
the overhead is smaller.
> In other words, this change doesn't affect the high level structure of
> the deltas tree (except it makes each leaf a 15-deltas chain), but it
> trades disk space savings for more delta combinations in historical cats
> (eg, revision 16n+8 will need X+8 deltas, instead of X+1.)

Correct, the high-level structure is untouched and the
repositories can be read & written by pre-1.8 and 1.8+
servers at the same time.

The idea is to save disk space without impacting CPU
load too much (<10% for a linear delta limit of 16, and
approx. + 50% for 64 - IIRC).
> Are there scenarios in which this change is significant? What about
> people who store large binaries that change frequently in svn, should we
> be recommending them to set SVN_FS_FS_MAX_LINEAR_DELTIFICATION to
> a small number?

It is not easy to give a definitive answer these questions.

Our xdelta implementation, for instance, tends to perform
poorly on large binaries due to its fixed-size small-ish
delta window. As a result, most windows will be stored
as full-text and the reconstruction code has a short-cut
for that case. Large, pseudo-random files will not be
affected by the deltification change.

A major motivation for this change has been directory
deltification. In combination, they bring many revisions
down to 4kB and less. A typical RAID will then deliver
30 revisions with a single I/O (128k strip size). Chances
are that this range covers more than one of the deltas.
Because the overall repository size gets smaller the
user might even see a reduction in overall I/O.

Another recent improvement should also help to reduce
server load when combined with the deltification change:
All reconstructed fulltext along the deltification chain will
be cached. "SVN up" is a frequent operation an requires
the access to (usually) HEAD and BASE. With the linear
delta chain, BASE is likely to be within the delta chain
of HEAD. Since HEAD needs to be reconstructed anyway,
the access to BASE is for free in that case.

This analysis assumes that related commits are relatively
close together. If many users work on disjoint parts of the
repository at the same time, no linear deltification and
a much lower deltification limit might result in the best
overall performance.

A physical layout like notes/fsfs-improvements.txt , OTOH,
will deliver best performance if both parameters are large.

> Should we be making these fsfs.conf settings, instead
> of compile-time macros?

Yes. Along with directory and property deltification.

-- Stefan^2.
Received on 2012-04-07 22:28:33 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.