I'm completely failing to convey the distinction I have in mind.
Let's limit the discussion to the repository; the network protocol is
These two things are different:
- Checking that the bits one wrote to disk are the bits one got back
- Checking that the text of the file revision one stored in the
repository can be correctly reconstructed.
The first checks a single step in a long process. It will catch disk
and OS failures; it will not catch bugs in Subversion.
The second is an end-to-end test: am I able to reconstruct the text I
stored in the repository way back when? It is a more comprehensive
test --- it will catch all the corruption the first test will, and
will also detect corruption caused by our mistakes.
The checksums which are already present in the design will catch both
bugs in Subversion, and corruption due to faulty disks. If you can
see a better place to put the checksum, please do point it out.
I'm trying to remember Bruce Schneier's analogy explaining the
significance of encryption in security. He compares strong encryption
to putting a stake in the ground, and hoping your enemy trips over the
stake. You can argue about how tall the stake should be (how many
bits long your key is, which cipher to use) but your enemy will
probably simply walk around the stake.
We can debate about whether and where checksums are necessary, but I
suspect we'll prevent more corruption overall by:
- reviewing the code carefully for bugs
- contributing tests to the regression test suite
- setting up harnesses for continual random testing
- noticing significant simplifications
- commenting code which is unclear
and so on.
Received on Sat Oct 21 14:36:27 2006