With the recently added support for LZ4 compression (r1801940 et al),
we now have an option of using it by default for the on-disk data and
over the wire.
For those who haven't been following this topic, here's a quick recap:
- Currently, our default compression algorithm is zlib.
- LZ4 offers much faster compression and decompression speed than zlib
and includes a heuristic to skip incompressible data.
- LZ4 has worse compression ratio than zlib-5 (our current default).
In this dimension, it is more or less comparable with the compression
ratio of zlib-1, although zlib-1 still has a slightly better compression
ratio. See https://quixdb.github.io/squash-benchmark/ for additional
information on this (the codecs to compare are "lz4 - 7" and "zlib - 1").
- Only the new filesystem format 8 allows using LZ4 for the on-disk data.
- Using LZ4 over the wire requires both endpoints to advertise that they
know how to deal with the new svndiff2 format that allows LZ4 compression.
There are two questions to consider:
(1) Do we want to start using LZ4 compression over the wire by default?
If yes, do we want this default to apply to all installations or to
only affect part of the installations where it makes sense?
(2) Do we want to switch to the LZ4 compression for the on-disk data
I propose the following approach. Please note that for the wire format
part, it only considers the http:// protocol, but we can optionally adjust
(A) For the HTTP wire format, we start using LZ4 compression by default,
but only over local networks.
The reasoning behind this is that we probably wouldn't want to start
always using LZ4 compression, as that would result in a regression over
WAN, where the better compression ratio is usually preferable to the
compression performance. Another point is that even for local networks
we cannot disable compression altogether, because for slow 10 or even
100 Mbps LANs, where the throughput is limited by the slow network,
using fast compression can be better than no compression. This is
where LZ4 comes to the rescue by offering reasonable compression
ratio and fast compression speed.
This approach is currently implemented with the http-compression=auto
client-side configuration option (r1803899), which is the new default.
While the HTTP client is generally in charge of the used compression
algorithm, there's also a way to override its preference on the server.
If the mod_dav_svn's SVNCompressionLevel directive is set to 1, a
server would then override the client's preference and still send
svndiff2 / LZ4 data if the client can accept it.
(B) For the on-disk data, we start using LZ4 compression by default
(in format 8 repositories).
The reasoning behind this is that currently, zlib compression is a
hotspot that can limit the performance of both read and write
operations on the repository. It also affects how well Subversion
works when dealing with large and, possibly, incompressible files
(and I tend to think that it's a fairly important use case).
Switching to a faster compression algorithm that is also used by other
various file system implementations should improve the performance of
such operations in a visible way. Please note that this change is a
trade-off between the compression ratio and speed of the operations.
The repositories using LZ4 compression would require a bit more disk
space. The amount of the required additional space is proportional
to the difference between the compression ratio of LZ4 and zlib-5,
which can be roughly estimated as around 30-35% for compressible
binary and text files, although that may vary depending on the
To illustrate how these changes will affect the speed of some of the
operations, the 'svn import' of a 2 GB file over HTTP on LAN in my
environment takes 18 seconds instead of 63 seconds.
How does this sound? Are there any objections or suggestions to the
(Please note that most of the implementation is already in place, and to
get the described behavior we would just have to change a couple of default
Received on 2017-08-02 20:59:41 CEST