[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Compressed Pristines (Design Doc)

From: Philip Martin <philip.martin_at_wandisco.com>
Date: Thu, 22 Mar 2012 23:00:12 +0000

Erik Huelsmann <ehuels_at_gmail.com> writes:

> As the others, I'm surprised we seem to be going with a custom file format.
> You claim source files are generally small in size and hence only small
> benefits can be had from compressing them, if at all, due to the fact that
> they would be of sub-block size already.

I was surprised too, so I looked at GCC where a trunk checkout has
75,000 files of various types:

$ find .svn/pristine -type f | wc -l


$ du -hs .svn/pristine
635M .svn/pristine
$ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}'

Individually compressed is smaller by a factor of 2:

$ find .svn/pristine -type f | xargs gzip
$ du -hs .svn/pristine
367M .svn/pristine
$ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}'

As one single file is smaller by another factor of 3:

$ find .svn/pristine -type f | xargs cat >> one-big-file
$ du -hs one-big-file
122M one-big-file
$ ls -ls one-big-file | awk '{print $1}'

When individually compressed most of the 75,000 files are less
than 4K:

$ find .svn/pristine -size -4096c | wc -l

more than half are less than 1K:

$ find .svn/pristine -size -1024c | wc -l

and nearly half are less than 0.5K:

$ find .svn/pristine -size -512c | wc -l

In the uncompressed state:

62323 are less than 4K
36648 are less than 1K
21828 are less than 0.5K

Maybe GCC is not typical but, rather to my surprise, combining the
compressed files would be a significant improvement.

I also have an httpd trunk checkout (needs cleanup so bigger than

90M uncompressed
37M individually compressed
23M as one big file

That's more like your figures for Subversion where the major step is
individual compression.

uberSVN: Apache Subversion Made Easy
Received on 2012-03-23 00:00:50 CET

This is an archived mail posted to the Subversion Dev mailing list.