Re: caching (was: Re: Subversion presentation at SVLUG)

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2001-04-09 08:27:44 CEST

Greg Stein <gstein@lyra.org> writes:
> > Well, there's a bit more work needed. To handle update requests
> > helpfully, your cache needs to be able to answer requests for deltas
> > between different versions, not just the texts themselves. So those
> > caches are going to have to be pretty smart --- if the caches can
> > produce deltas, I'd call that a distributed repository.
>
> It will be possible to cache deltas. If the main server generates a
> particular delta pair, then the cache can store that pair. It will be keyed
> with the pair information and can be served up to the next person who needs
> it.

We discussed this before. If the cache has the delta between
revisions 10 and 11, and between 11 and 12, and the client asks for
the delta between 10 and 12, you'd like the cache to be able to answer
without asking the master repository. I'm guessing that the caching
approach you have in mind could only serve exactly those deltas that
it had seen before, even if it had others deltas it could combine to
yield the one the client requested --- is that right?

Is this issue significant enough for users to care? I think so. When
I do an update, I usually find myself getting the results of several
commits at a time. The larger the tree, the more updates I get --- in
Red Hat's comp-tools (gcc, gas, binutils, gdb), it looks like about a
dozen per day. This suggests that people will be often asking for
deltas that span many revisions. If people reach each revision from
(say) a dozen different prior revisions, then a proxy which doesn't
understand the big picture is going to retrieve and store a dozen
deltas reaching each revision.

The upside of dumb caches is that a dumb cache could probably spit out
a delta it has cached much faster than a smart cache could generate
that same delta from fulltexts.

The downside is that a dumb cache has to retrieve each of those deltas
separately from the master repository over the net, when you really
only need to fetch one. (And the deltas a smart cache needs ---
spanning single commits --- will usually be smaller than the
multiple-commit deltas a dumb cache will usually be transferring.)

My guess is that a cache which really understands that it's dealing
with successive revisions of a tree will be sufficiently more
effective that people will bother to install it, if someone writes
one. Since the filesystem library is independent of the network
protocol, and not tailored to a specific set of commands, it should be
straightforward to reuse it to implement a smart cache.
Received on Sat Oct 21 14:36:28 2006

This message: [ Message body ]
Next message: Brian Behlendorf: "Re: Subversion presentation at Silicon Valley Linux Users Group"
Previous message: Greg Stein: "Re: How contribute win32 patches?"
In reply to: Greg Stein: "caching (was: Re: Subversion presentation at SVLUG)"
Next in thread: Brian Behlendorf: "Re: Subversion presentation at Silicon Valley Linux Users Group"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]