[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: A forward-thinking thought about deltas in FS design

From: Tom Lord <lord_at_emf.net>
Date: 2003-07-07 08:21:28 CEST

> From: Greg Hudson <ghudson@MIT.EDU>

> > Have you considered changing the top row of your diagram [to]:

> > 0 <- 1 <- 2 <- 3 <- 4 <- 5 <- 6 <- 7 <- 8 <- 9 <- 10 <- 11 <- 12 ....

> Yes, but I don't think there's an advantage.

I pointed out several.

> > So if you cached the other arrows in your diagram, and the cache is
> > given roughly the same amount of disk space as is needed for all the
> > arrows in your diagram, then the cache will function as a memo, and
> > you'll have the same performance guarantees that you have now.

> So, I'm never quite sure how you use the word "cache". In my world, a
> cache is (1) something you populate on demand, and (2) something you can
> throw away.

That's correct.

> You don't typically get performance "guarantees" with a
> cache because you don't know if it has been populated with the fast path
> you want since the last time it has been thrown away.

That's false.

You have to consider (a) your algorithm that decides what to cache and
what to throw away; (b) the size of your cache.

Under (often quite reasonable) interactions of (a) and (b), you can
make strong guarantees.

> For guaranteed acceptable performance, you cannot treat fast path data
> as expendable. It is too important for ensuring reasonable access times
> and it takes too long to rebuild. So it has to be updated reliably, it
> has to be backed up just as well, it has to be replicated just as well.

You're oversimplifying quite a bit. I'm not even sure where to begin
debugging you. I'll try anyway.

Given a cache such as I described, the hard guarantees you're worried
about obtain. You can back it up if you like. You can so send in a
junior-clueless-admin to run "rm -rf" on a portion of the cache -- and
then what happens? the performance guarantees are lost for a few
hours. the system gracefully carries on anyway. then everything is
back to normal.

> So either you're using the word "cache" loosely

No -- you're using it to mean more than it means.

> > The caching/memoizing approach would make it
> > easier, I think, to be computing skip-deltas concurrently and
> > opportunistically.

> Why would you want to?

For better actual, real-world performance.

> It doesn't take more time (on average) to
> compute the skip-delta than it does to compute the single delta.

That isn't the point. It's not relevant to anything I said.

> > Read-only mirrors and clients could be building their own caches.

> Again, no advantage versus my approach; read-only mirrors and clients
> could be building their own caches there too.

In general, you seem to be armed with a lot more basic theory than
non-basic theory or practice.

> > Given the kinds of access needed to "row 0",
> > perhaps it could be stored more compactly and updated more quickly.

> I don't see this translating into a practical advantage.

Um, much faster and more space-efficient write-txns?

> > A cache could easily revert to full-texts when skip-deltas become
> > irrational.

> Why would they be any more likely to become irrational than single
> deltas would be?

Hello?!?! Once again, large skip-deltas on busy files are going to
converge on full-text sizes.

> You've proposed a tiny little simplification of the primary storage
> manager at the expense of decomposing the storage system into two parts,
> radically increasing the overall complexity of the system. Not a win.



To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jul 7 08:19:48 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.