[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: compression

From: Edward Ned Harvey <svn_at_nedharvey.com>
Date: Tue, 29 Jun 2010 16:57:02 -0400

> From: Daniel Shahaf [mailto:d.s_at_daniel.shahaf.name]
> Edward Ned Harvey wrote on Tue, 29 Jun 2010 at 07:15 -0000:
> > Svnserve is using standard zlib (not a parallel implementation) so
> You can disable all (most?) compression by not advertising the
> svndiff1 capability in svnserve/serve.c.


> > Some people are seeing 20min commit times on changes they could have
> > copied uncompressed in 1min.
> How do you know how long the commits would have taken with compression
> disabled?

Without svn, I "cp" or "cat > /dev/null" the file (after reboot, with cold
cache.) So I see how long it takes to do the actual IO. And then I
benchmarked several compression algorithms on it (lzop, gzip, bzip2, lzma,
7-zip) with warm cache, so the timing is 100% cpu.

These results are corroborated by the fact that the users are sometimes
competing against each other (bad performance) and sometimes they do an
update or commit while nothing else is happening. If they're lucky enough
to do their update/commit while the system is idle, it takes ~60 sec. If
two people are doing something at the same time ... it seems to scale
linearly, but the more collisions you have, the more likely you are to have
more collisions. I've had the greatest complaints for >15min commits.

So far, I've greatly improved things by just adding more cores to the
server. But I wouldn't feel like I was doing a good job, if I didn't
explore the possibility of accelerating the compression too.

> > Based on what I see in the source code, I think I can simply change
> the
> > compression level to 0 or 1 (instead of the default 5) or even just
> disable
> > compression by tweaking a few "if" statements and so forth ... And
> > recompile.
> IMO, don't disable it entirely; because that way you don't have to
> guess, at each stream, whether or not it needs decompression.

It occurs to me, you don't know, that this is how it already works today.
More below.

> > As far as I can tell, there is no harm in doing this. When data is
> read
> > back out ... If the size matches, then it was stored uncompressed,
> and
> > hence, no uncompression needed. If the size is less than the
> original size,
> > then it must have been stored compressed, and hence uncompression is
> needed.
> A compressed file may or may not be shorter than the original file.
> You may not know the size/length in advance.

The way things are right now, svndiff, zlib_encode() take a chunk of data,
performs compression on it, and writes (a) the size of the data, and (b)
whichever is smaller: the data, or the compressed data.

Later, svndiff, zlib_decode(), reads the size which zlib_encode() wrote,
reads the data which zlib_encode() wrote, and if the size doesn't match,
zlib_decode() will decompress the data, to get a chunk of data whose size
does match.

> I don't like the idea of getting a stream and not *knowing* whether or
> not its compressed.

This is the way things are right now. zlib_decode() doesn't know if it's
compressed or not, until it checks the size.

> Note that the definition of svndiff1 ("svndiff version 1" ) hard-wires
> zlib (see notes/svndiff).

Oh yeah. Thanks. ;-)
That says what I just wrote about above. ;-)

"In svndiff1, in order to determine the original size, an integer is
appended to the beginning of each of the sections. If the original size
matches the encoded size (minus the length of the original size integer)
from the header, the data is not compressed. If the original size is
different than the encoded size from the header, the remaining data in the
section is compressed with zlib."
Received on 2010-06-29 22:58:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.