[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: A forward-thinking thought about deltas in FS design

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2003-07-07 07:43:17 CEST

On Sun, 2003-07-06 at 12:18, Tom Lord wrote:
> Have you considered changing the top row of your diagram [to]:

> 0 <- 1 <- 2 <- 3 <- 4 <- 5 <- 6 <- 7 <- 8 <- 9 <- 10 <- 11 <- 12 ....

Yes, but I don't think there's an advantage.

> So if you cached the other arrows in your diagram, and the cache is
> given roughly the same amount of disk space as is needed for all the
> arrows in your diagram, then the cache will function as a memo, and
> you'll have the same performance guarantees that you have now.

So, I'm never quite sure how you use the word "cache". In my world, a
cache is (1) something you populate on demand, and (2) something you can
throw away. You don't typically get performance "guarantees" with a
cache because you don't know if it has been populated with the fast path
you want since the last time it has been thrown away.

For guaranteed acceptable performance, you cannot treat fast path data
as expendable. It is too important for ensuring reasonable access times
and it takes too long to rebuild. So it has to be updated reliably, it
has to be backed up just as well, it has to be replicated just as well.

So either you're using the word "cache" loosely (a better word would be
"augmentation"), or you're fast-talking around the GAP requirement.

> The caching/memoizing approach would make it
> easier, I think, to be computing skip-deltas concurrently and
> opportunistically.

Why would you want to? It doesn't take more time (on average) to
compute the skip-delta than it does to compute the single delta.

> Read-only mirrors and clients could be building their own caches.

Again, no advantage versus my approach; read-only mirrors and clients
could be building their own caches there too.

> Given the kinds of access needed to "row 0",
> perhaps it could be stored more compactly and updated more quickly.

I don't see this translating into a practical advantage.

> A cache could easily revert to full-texts when skip-deltas become
> irrational.

Why would they be any more likely to become irrational than single
deltas would be?

> In a long-lived, busy archive, I'm not so sure the
> performance guarantee you are after is really worth more than a
> guarantee of _better_ performance where the access patterns are hot --
> and the caching approach can make that trade-off smoothly (if it takes
> me 3x as long to retrieve a 7-year old kernel that nobody else has
> fetched for years, but 1/2 as long to get last months ....).

We're not talking about a factor of 3x here; we're talking about O(n)
versus O(lg(n)), which can quickly become a factor of 100 or 1000.

> The simpler row-0 representation

>From the point of view of the current Subversion design, the single
delta representation isn't really any simpler. A rep is either plain
text or a delta against another rep; skip-deltas just change what the
"other rep" is. All the complexity of skip-deltas (which isn't very
much) is in the algorithm, not on disk. (Tiny exception: to implement
skip deltas, I had to add a predecessor count to node-revisions.)

> , being the only txnal data needed, can
> help stabilize the implementation of the storage manager and simplify
> its design. The cache implementation could be made independent of the
> primary storage manager internals -- making it easier to write new
> storage manager implementations (and new caching-engine
> implementations).

You've proposed a tiny little simplification of the primary storage
manager at the expense of decomposing the storage system into two parts,
radically increasing the overall complexity of the system. Not a win.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jul 7 07:50:09 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.