[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: caching proxies and SVN network perf

From: Brian Behlendorf <brian_at_collab.net>
Date: 2000-10-24 18:03:51 CEST

On 24 Oct 2000, Karl Fogel wrote:
> "Premature optimization is the root of all evil."

Agreed, but somewhat on that note, for future reference...

Greg and I over lunch talked about optimizing the performance when
going through proxies, specifically about the issue I posted about
regarding having multi-version-spanning diffs and whether proxies could be
"smart" about dealing with them & requests for subsets or
cross-sets. To recap (& use a better example):

Bob has a SVN tree with a file foo.c, v1.5.
Jane has a SVN tree with a file foo.c, v1.6.
The current repository has been updated, and has v1.7.

I do an update. I get the diff(1.5,1.7) long-distance from the central
server, and that is cached by the proxy.

Jane does an update. SVN determines she needs diff(1.6,1.7), the cache
sees it doesn't have that, and gets it long-distance from the central
server.

This isn't optimal, since the proxy already "has" that delta.

It may be that it's more efficient for a svn client, when it "knows" it's
going through a proxy server, to always ask for discrete patches. I.e.,
make a request to the server for diff(1.5,1.6) and diff(1.6,1.7). That
way, when Jane requests diff(1.6,1.7), that can be served directly by the
proxy. Thanks to pipelining in HTTP/1.1, asking for two files can be done
in the same request and served back in the same response, meaning that we
can largely avoid the latency penalty that two connections might
imply.

We have higher HTTP header overhead, potentially, but only statistical
analysis on actual traffic can tell us whether it's significant; likewise
it'll also be the only way to tell how much actual benefit we will get,
because it's dependent on what percentage of svn update traffic on an
"average" project consists of single-version updates versus multi-version.
I suspect it's a lot, if we include svn checkout in this, since a
checked-out file will not get another cache hit as soon as the version on
it is increased - meaning a checkout of version 1.6 of really-big-file and
then a later checkout of the same file after an update will cause
essentially two copies of really-big-file in the proxy cache, and only one
will ever potentially get hit again.

Anyways, again, premature optimization. But it's fun to think about
this.

        Brian
Received on Sat Oct 21 14:36:12 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.