[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

compression

From: Edward Ned Harvey <svn_at_nedharvey.com>
Date: Tue, 29 Jun 2010 00:15:04 -0400

Compression can be good, or bad for performance. Whenever somebody has
large binary files managed by svn, and all the access is on a high speed
LAN, then compression is usually bad ... I'm trying to help a company right
now, where that is the situation. Performance is *crazy* bad because we're
waiting for hundreds-of-meg files to compress for 30-60 sec, which could
have been written uncompressed in 5 sec. Svnserve is using standard zlib
(not a parallel implementation) so each file read or written must be handled
serially by a single core, from start to finish. Only separate connections
enable the use of multiple cores, and naturally there isn't enough processor
horsepower to go around either, so multiple clients end up competing for CPU
cycles and thrashing each other. Some people are seeing 20min commit times
on changes they could have copied uncompressed in 1min.

 

Based on what I see in the source code, I think I can simply change the
compression level to 0 or 1 (instead of the default 5) or even just disable
compression by tweaking a few "if" statements and so forth ... And
recompile.

 

The question I have is:

 

As far as I can tell, there is no harm in doing this. When data is read
back out ... If the size matches, then it was stored uncompressed, and
hence, no uncompression needed. If the size is less than the original size,
then it must have been stored compressed, and hence uncompression is needed.
You don't need to know the original compression level; the uncompression
algorithm is the same either way.

 

Can anybody confirm or deny my beliefs? "Should be good." or "Don't do
that!"

 

If I choose to contribute changes, is there any interest? Maybe I'll just
keep the changes local here and nobody even cares if I try to contribute
them back to the source. Changes I might be interested in pursuing, if
anybody wants to encourage me slightly, are:

 

Parallel implementation of compression (utilize multiple cores/threads)

Configurable / disable-able compression level (config file edit, no
recompile necessary)

Optional compression algorithm. zlib vs bzip2 vs 7-zip (probably not
possible to do LZO, but I'd like it, if possible.)

 

Thanks...
Received on 2010-06-29 06:16:13 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.