Greg Hudson <ghudson@MIT.EDU> writes:
> (Note: I'm not advocating we actually do any of this.)
>
> > (Now, I haven't really thought this through, but it seems to me that
> > any error-correcting data would have to be proportional in length to
> > the thing it was capable of detecting and correcting errors in.
> > That would kind of defeat the purpose of using deltas to begin
> > with.)
>
> Not at all. Consider this "simple" approach (it's not simple or
> adviseable in practice because it would involve slipping a layer
> underneath the Berkeley DB, but it's still theoretically sound):
>
> * Include in each page of the database a checksum of the page,
> such that you know if a page has been corrupted.
>
> * After every block of N pages in the database, keep an extra
> "parity page" which contains a bitwise parity calculation of
> the block of pages.
>
> Now if any single page is corrupted, you blow away that page and
> recompute it using the parity page and the other pages in that block.
> Of course, corruption would probably happen across a swath of pages,
> so instead of having a parity block across a sequential group of
> blocks you'd keep it across a non-local group of blocks (blocks 1,
> 1001, 2001, ...), or across blocks which live on different spindles.
> The space cost is 1/N times the previous size of your database, plus
> the cost of the page checksums.
>
> This is all basic RAID stuff, with checksums thrown in because RAID
> normally assumes that disk failure are detectable. There exist more
> advanced error correction techniques which can recover from more
> complicated failures.
(Continuing fun theoretical discussion which doesn't affect Subversion:)
That's equivalent to storing parity for the delta itself. But the
checksum in `structure' is for the fulltext which results from
applying the delta: it's checking the end result. It'll catch bugs in
the delta generator and the delta applier.
My original contention was that there's no way to recover from errors
in the delta generator and delta applier without storing parity (or
other reconstruction information, whatever) proportional in size to
the fulltext. Said proportionality is what we're using deltas to
try to avoid.
Received on Sat Oct 21 14:36:27 2006