Re: caching proxies and SVN network perf

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2000-10-24 00:46:52 CEST

> Also, our SVNDIFF format is quite good. HTTP can also (tranparently)
> add GZIP encoding on top of that automatically. The GZIP will
> squeeze our diffs down, but also original checkouts, too!

A couple of points here:

        * svndiff will compress original checkouts without additional
          gzipping. You just make a diff against an empty source.
          (It's not as good of a compressor as gzip, of course.)

        * Although gzip can compress svndiff output somewhat, it can't
          do so as well as it could compress the original data. So,
          my previous point notwithstanding, you'd be better off *not*
          using svndiff for an original checkout if you know your
          transport is going to be gzipped.

Here are some data points using the first of my elc data sets (all
numbers are for performing the operations on each file individually
and then totalling the results; diff operations were peformed against
a source of /dev/null):

        Raw size: 6912321
        svndiff alone: 3471614 (50%)
        svndiff+gzip: 2976929 (43%)
        gzip alone: 2541305 (37%)

svndiff+base64: 4690465 (68%)
svndiff+base64+gzip: 3332093 (48%)

I don't really think network bandwidth usage really drives performance
as seen by the end user, but I thought I'd pipe up anyway. We're
unlikely to do worse than the CVS pserver unless HTTP overhead becomes
really cumbersome.

As long as I'm talking about performance, I'll note that I took a look
at one of the .elc files I was using, and .elc files are a really poor
example of "binary data." You mostly get a lot of doc strings (which
contain newlines) interspersed with short amounts of bytecode, so
apart from the presence of funny characters, they look a lot like text
files as far as diff is concerned. I plan to create some better
binary test data by compiling two versions of a program and comparing
the resulting object files (yet another idea stolen from the Hunt
paper). I've also discovered that you can save about 5% of total
output size by using the fourth possible instruction code for "copy
with offset relative to last copy instruction", but I didn't think a
5% savings was worth the complexity.
Received on Sat Oct 21 14:36:12 2006

This message: [ Message body ]
Next message: Jim Blandy: "Reserved checkouts"
Previous message: Bruce Korb: "Re: Which option parser? (was Re: cmd line stuff)"
In reply to: Greg Stein: "caching proxies and SVN network perf"
Next in thread: Jim Blandy: "Re: caching proxies and SVN network perf"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]