[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: caching proxies and SVN network perf

From: Greg Stein <gstein_at_lyra.org>
Date: 2000-10-24 13:40:41 CEST

On Tue, Oct 24, 2000 at 03:42:55AM -0700, Brian Behlendorf wrote:
> On Tue, 24 Oct 2000, Greg Stein wrote:
> > It could probably do diffs, but I'll have to get some stuff implemented
> > because I'm not exactly sure how we'll be implementing the diff draft
> > (referenced from the webdav-design.html document in CVS). Specifically, that
> > should have some "Vary:" headers which would help control how the proxies
> > will cache and under what "key", if you will.
>
> It won't cache intelligently, though (diffs between arbitrary versions
> that are a subset of what's been fetched) without some serious work at the
> caching proxy level. Unless every update is fetched as a series of diffs
> between sequential versions of a file (e.g., updating from 1.2 to 1.4 of a
> given file transfers diff(1.2, 1.3) and diff(1.3, 1.4) instead of diff
> (1.2, 1.4)), then it's going to be difficult for the proxy to respond to a
> request for diff(1.3, 1.4) if all it's got is diff (1.2, 1.4). So some
> way of supporting that latter case would be really interesting to me.

Not a problem at all... There are two parts here:

1) Each version of a file has its own URL. Therefore, an HTTP GET of that
   URL means that the cache can retain copies of the individual version.

2) For a diff (say, from 5 to 7), SVN asks for v7 of the resource and
   appends a header to the request stating "I have v5 and understand <these>
   diff formats." The server can then return a diff from v5 to v7.

The trick is to include a Vary: header which refers to those extra diff
headers. The cache keys its values based on the URL and the contents of the
headers listed in the Vary: header.

The next person to ask for v7, with the v5 listed in the diff header will
see the document returned from the cache.

[ It would be nice to give a concrete example here, but like I said: I need
  to really dig into the diff-draft and concretely explore how this will be
  done. ]

IOW, any HTTP/1.1 caching proxy that properly obeys the Vary: header (per
the HTTP spec) *will* cache diff response. The cache can also hold
individual versions of a file since each has a unique URL.

> > But the bulk: big time.
>
> Given checkouts involve a big batch of bytes initially, sure. However, a
> checkout for a file of a given version one date and the same file after
> the next commit are not going to be optimized, they'll have separate cache
> keys. I think.

Yes, they will have different keys. For example:

    http://host/repos/$svn/ver/10.100.5
    http://host/repos/$svn/ver/10.100.7

[ I don't know that I got those nodestrings right, but you get the point ]

> So it's not quite as romantic as 100% replicated
> repositories - though getting there can be done, and won't require
> modificatons to the installed client base to get there.

Well, the *first* person will load the cache... sure, they aren't going to
get a quick load from the cache. But the *next* person will see the benefit.
And when you get a whole group of people updating... great benefit.

> > Just think... an SVN repository getting picked up and shared across the
> > akamai caching network! Woo!
>
> That might be tough, though, since Akamai is a read-only network, as far
> as I know. A svn checkout of http://blah.akamai.net/blah implies a commit
> against that same resource, doesn't it? I'm sure Akamai could implement
> something that supports that, but it won't just work out of the box.

"svn checkout" is a read-only operation. It certainly could be loaded into a
caching network.

Now, does akamai simply cache stuff out of the blue? No idea. It seems that
people may need to have a biz relationship with akamai first. *shrug* My
point wasn't to provide a concrete example, but to point out that a caching
network *could* create some scaling benefits for SVN repositories.

> BTW, replicated repositories is something we (collab.net) are *very*
> interested in helping see happen.

Read-only copies shouldn't be too difficult. When somebody does a commit, we
just redirect to the master. Having multiple masters that must resolve
conflicts between them... icky. That will be a bitch. Although, I bet there
is a lot of theory out there on how to do this, so it might be a "simple"
reduction of theory to practice. Theoretically. :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
Received on Sat Oct 21 14:36:12 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.