Kevin Pilch-Bisson <email@example.com> writes:
>> and what are the actions on the checksumming failures?
> That's really hard to say, and is usually at the discretion of the user, and
> is not possible to accurately automate.
I think that's an overstatement. There are sensible error-recovery
actions for your three examples -- all of them have to involve the
user, but the computer can help.
> 1) Mismatch of fulltext checksum in repository. Probably means either
> database corruption or disk failure. Solution: Revert to a backup.
> Automatable: No.
Agree that the actual recovery is not automatable, but automatic
actions can prevent further damage and shorten the outage window.
When such a mismatch is detected, the repository locks down to
read-only access; transactions in progress are aborted with
unambiguous error message displayed on user's terminal; and an
e-mail goes to the system administrator reporting the problem.
> 2) Mismatch of fulltext checksum over the wire. Probably means a broken
> TCP/IP implementation. Solution: fix it/switch ISPs. Automatable: no.
This is likely to be an intermittent fault. On the server side, roll
back any write operations not yet committed. On the client side,
display a warning and retry the operation. If it fails a second time,
give up. Dump a detailed error log to a file in /tmp and tell the
user its name.
> 3) Mismatch of text-base checksum in working copy. Probably means
> either 1) a bug in svn's current code (in the short term) or 2) The
> user somehow managed to edit their text-base copy. Solution: Get a
> new copy of the text base from the repository that is not corrupt.
> Automatable: yes, EXCEPT What if the edits were a weeks worth of
> change to a source file (the person didn't realize that it was the
> text-base version they were editing). If we replace it
> automatically with the text-base version, the user is going to be
> mighty pissed. Thus in practice this is not automatable either.
Rename the corrupted text-base file out of the way, tell the user the
new name, and proceed to fetch a new copy. Pick a naming convention
for renamed corrupt files that facilitates debugging, and allows
saving several copies of the same file (i.e. if the same file gets
corrupted several times, we should keep all the corrupted copies).
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Sun Feb 2 20:58:32 2003