(Note: I'm not advocating we actually do any of this.)
> (Now, I haven't really thought this through, but it seems to me that
> any error-correcting data would have to be proportional in length to
> the thing it was capable of detecting and correcting errors in.
> That would kind of defeat the purpose of using deltas to begin
> with.)
Not at all. Consider this "simple" approach (it's not simple or
adviseable in practice because it would involve slipping a layer
underneath the Berkeley DB, but it's still theoretically sound):
* Include in each page of the database a checksum of the page,
such that you know if a page has been corrupted.
* After every block of N pages in the database, keep an extra
"parity page" which contains a bitwise parity calculation of
the block of pages.
Now if any single page is corrupted, you blow away that page and
recompute it using the parity page and the other pages in that block.
Of course, corruption would probably happen across a swath of pages,
so instead of having a parity block across a sequential group of
blocks you'd keep it across a non-local group of blocks (blocks 1,
1001, 2001, ...), or across blocks which live on different spindles.
The space cost is 1/N times the previous size of your database, plus
the cost of the page checksums.
This is all basic RAID stuff, with checksums thrown in because RAID
normally assumes that disk failure are detectable. There exist more
advanced error correction techniques which can recover from more
complicated failures.
Received on Sat Oct 21 14:36:27 2006