[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Linux Kernel Summit

From: Jim Blandy <jimb_at_zwingli.cygnus.com>
Date: 2001-04-03 06:50:01 CEST

Lim Swee Tat <st_lim@stlim.net> writes:
> I'm not very sure what you propose the use of the checksum should be
> for. Transmission over wire?? or to verify data corruption on data that
> has been stored on disk.

The latter. Older revisions of texts are represented as deltas to be
applied to the fulltext of the next younger revision; the youngest
revision is always stored as fulltext. (There are many other ways to
structure this, with different time/space tradeoffs; we're keeping
things simple to begin with.)

This means that, as newer revisions are committed, the representation
of a particular revision gets "smeared" out over more and more younger
revisions: If the youngest revision is Y, the representation of an
older revision O depends on Y's text, and Y-O deltas. (This is the
way RCS and CVS work, too. SCCS (and thus Bitkeeper?) do things
differently.)

The checksum serves to reassure us that this smearing process hasn't
actually done any damage to the file contents. It's there to catch
bugs.

We could, of course, go much further with error-detecting and
-correcting techniques, but it seems to me like a waste of time. What
we need is some way to at least notice that something has gone wrong.
Our recovery strategy is for the sysadmin to go get the backups.

(Now, I haven't really thought this through, but it seems to me that
any error-correcting data would have to be proportional in length to
the thing it was capable of detecting and correcting errors in. That
would kind of defeat the purpose of using deltas to begin with.)
Received on Sat Oct 21 14:36:27 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.