On 02/16/2010 08:54 AM, Neels J Hofmeyr wrote:
> They are merely half-checks for validity. During normal operation, size and
> mtime should never change, because we don't open write streams to pristines.
> If anyone messes with the pristine store accidentally, we would pick it up
> with the size, or if that stayed the same, with the mtime. But we can pick
> up all cases of bitswaps/disk failure *only* by verifying *full checksum
> validity*!
>
> So, while checking size and mtime gives a sense of basic sanity, it is
> really just a puny excuse for not checking full checksum validity. If we
> really care about correctness of pristines, *every* read of a pristine
> should verify the checksum along the way. (That would include to always read
> the complete pristine, even if just a few lines along the middle are needed)
>
Checking size and mtime gives huge benefits over checking contents. Size
and mtime can be picked up with a single stat(), whereas a checksum
requires open()/read()/.../close(). The data for stat() is usually
stored in the inode which is read in either situation, and often small
enough to be easily cached. For large work spaces, especially those with
multi-Kbyte files, doing checksum tests on most operations would result
in unacceptable performance.
I think it's fine to compare checksum on any files that are noticed to
have changed (size/mtime), but if the file looks unchanged, assuming
that it *is* unchanged, is a fine compromise for the performance gains.
If you want a "--compare-checksum" option which does the full check
optionally - it might be use to some people. I suspect most people would
avoid using it once they see how much more expensive it is...
Cheers,
mark
Received on 2010-02-16 21:24:52 CET