[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: why the change in the checksums

From: Kevin Pilch-Bisson <kevin_at_pilch-bisson.net>
Date: 2003-02-02 13:04:42 CET

On Sun, Feb 02, 2003 at 08:36:37AM -0800, solo turn wrote:
> does somebody know why exactly the "checksumming, more checksumming"
> was introduced?

As one of the people who was originally (and still is) very much in favour of
the checksum code, I think I'll post as much of an explanation as I can.

The purpose of a Revision Control System is to maintain an EXACT copy of all
of your versions of files. If it fails in the EXACT part, then the rest of
the system is irrelevant, because the system is worthless.

So, for each revision of a file, we store the checksum of the file in the
repository. This allows you to do something like search for database
corruption or disk failures via a cron-job that retreives each revision of a
file and its checksum, and compares it against the stored checksum. If there
is a mis-match, then that indicates a problem.

We also send the checksum over the wire during updates/checkouts/commits.
This lets us determine whether or not there was a network error in
transmission. Don't try and tell me that TCP already does this, because in
practice, it doesn't[1].

Finally, we store a copy of the checksum of the text-base file, so that we can
detect if the text-base copy of something has become corrupt. This is
important, because we send binary diffs against that text-base during
network operations, so we can't afford to have it be corrupted.
>
> and what are the actions on the checksumming failures?
>
That's really hard to say, and is usually at the discretion of the user, and
is not possible to accurately automate.

Let's take an example for each of the types of checksums.

1) Mismatch of fulltext checksum in repository. Probably means either
database corruption or disk failure. Solution: Revert to a backup.
Automatable: No.

2) Mismatch of fulltext checksum over the wire. Probably means a broken
TCP/IP implementation. Solution: fix it/switch ISPs. Automatable: no.

3) Mismatch of text-base checksum in working copy. Probably means either 1) a
bug in svn's current code (in the short term) or 2) The user somehow managed
to edit their text-base copy. Solution: Get a new copy of the text base from
the repository that is not corrupt. Automatable: yes, EXCEPT What if the
edits were a weeks worth of change to a source file (the person didn't realize
that it was the text-base version they were editing). If we replace it
automatically with the text-base version, the user is going to be mighty
pissed. Thus in practice this is not automatable either.
> we are a little stuck currently cause we don't know if we should
> upgrade from 16.0 and what happens then ....

You get to know about these types of failures, instead of your repository
becoming silently corrupted. Yes in the short term, you may be subjected to
the occasional checksum mismatch caused by a bug in subversion, but I think
that is a small price to pay in the long run.

[1] I once had and ISP that performed NAT for me. Having written an
implementation of NAT, I know that if you change the address/port of something
in TCP, you need to recalculate the TCP checksum. Problem was, they were
re-calculating the checksum and passing the packets along regardless of
whether or not the original checksum was valid. Thus, I checked out
subversion, and got a corrupted working copy that wouldn't even build.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • application/pgp-signature attachment: stored
Received on Sun Feb 2 20:07:22 2003

This is an archived mail posted to the Subversion Dev mailing list.