Jim Blandy wrote:
> I don't think Subversion should be involved in:
> 1) checking that bits were transmitted across the network without
> corruption, or
> 2) checking that bits were stored on disk without corruption.
I've worked with three large (multi-gigabyte), long-lived (5-10
years of history) cvs repositories, and each of them had file
corruption problems. Some problems were due to OS/filesystem
problems (most notably NFS). Some problems seem to have been RCS
bugs (file truncation - probably a disk filled and RCS didn't do
the right thing). Some seem to have been cvs bugs (I'm not even
sure how those files got in that state). Every large repository
I've worked with has accumulated a few dozen of these.
It doesn't do me any good if I have backups -- in every case, I didn't
find the corruption until many months after the fact. The corrupted
revisions were often some of the oldest, so the only time we'd find it
is if a user ran a command like "cvs log" (which digs through all the
revisions) on the corrupt file.
My desire for a checksum is obvious: I want to (a) know if a file
is corrupt, and (b) I want to find the corruption reasonably close
to the time when it happens. Backups, RAIDs, etc., are all nice,
but they don't do me any good if a repository is silently corrupted
over time. With CVS, I'm reduced to stupid things like running rcs
over every file in my repository to check file integrity.
Assuming that the OS will never corrupt a file is not good enough;
some OS somewhere at some time is going to corrupt one. And if
subversion doesn't have a way to detect that corruption, people
are going to blame it on svn. Or the problem could happen at a
higher level - maybe the svn filesystem or database could corrupt
part of a file's revisions. As you offer plug-in-able databases,
you'll be opening yourself up to even more points of failure.
The sine qua non of a revision control system is that your sources
are never corrupted. If there is disagreement about the utility
of these sorts of integrity checks, make them an optional setting.
I wager that many administrators will enable these checks even if
they incur a non-trivial performance penalty in the process. CPU
is cheap; my time doing detective work trying to track down corrupt
revision history is expensive.
(Of course you don't see me implementing anything, so I expect my
opinion to be given the appropriate weighting. But I thought I'd
throw in my two cents)
Received on Sat Oct 21 14:36:27 2006