[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Impact of SVN_DELTA_WINDOW_SIZE on large files with small differences

From: Charles Butterfield <charles.butterfield_at_nextcentury.com>
Date: Sun, 17 Feb 2008 20:23:21 -0500

One of my users commits a large text file (50 MB) containing minor
changes every couple of days. I was surprised at the size of the
deltas and did some crude testing (which was flawed by my lack of
understanding of the issues). I reported this in the thread titled
"Binary differencing performs poorly (erractically) on very large text
file", and got some excellent guidance.

It turns out the erratic aspect was largely due to my ignorance of the
skip-deltas algorithm which I exacerbated by checking in a zero length
initial version for my "controlled tests". Oh well.

However, I was also introduced to the SVN_DELTA_WINDOW_SIZE (currently a
#define) and wondered what the effect of varying it would be. Here are
my results from revised testing with a specific (real) set of sample
files.

Results:

DELTA Local Remote Average
WIN Commit Commit Rev 0 Rev 1-9
Size Time Time Size Size
---- ---- ---- ------ ----
100KB 73 71 11.5MB 2,138,000
1MB 66 61 8.3MB 174,600
10MB 103 84 6.8MB 20,535
100MB 248 156 5.9MB 4,800

Observations
------------
1) The penalty for the number of windows is roughly linear until the
   Window is the size of the file. In this case the file is 50 MB
   with an average delta of 4,800 bytes. At 10MB window size uses
   5 windows and bumps the average delta to 20 KB. A 1 MB window
   bumps the average delta by another factor of 10 and so on.

2) A big window compresses really nicely too.

3) Running the client remotely is faster on a 1 GB Ethernet.

4) The 1 MB window was actually faster than the 100 KB window. That
   Was surprising. Then there is a performance hit for larger
   Windows.

5) "make check" started to fail some tests at WIN=100MB, but
   The server and client (local and remote) seemed to work fine.
   Perhaps SEEMED is the operative word?

Conclusions
-----------
For really big files, the large number of windows causes small deltas to
be magnified linearly in the number of windows. Allowing a large window
size
seems to have positive effects on both the encoding of deltas as well as
compression of the "base" revision.

With my admin hat on, I would sure like the ability to specify a larger
window size (within limits) when processing repos with large files,
perhaps only when processing those large files.

So here is some food for thought if a developer gets bored.

Regards
-- Charlie

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-18 02:23:39 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.