[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

checksum bug, a great mystery.

From: Ben Collins-Sussman <sussman_at_collab.net>
Date: 2003-02-13 00:36:09 CET

OK, cmpilato and I have been chasing a checksum bug all day. I'm
going to summarize our findings to the list, in hopes that others may
have bright ideas. We've been working with HEAD code of course.

If you run two copies of stress.pl at the same time:

  xterm1: stress.pl -c -s1
  ...wait 10 secs...
  xterm2: stress.pl -s1

Then within 5 to 30 seconds, you'll get a commit error in the second
shell like this:

Transmitting file data ..subversion/libsvn_client/commit.c:669: (apr_err=200014)svn: A checksum mismatch occurred
svn: Commit failed (details follow):
subversion/libsvn_fs/dag.c:1391: (apr_err=200014)
svn: svn_fs__dag_finalize_edits: checksum mismatch, rep "1o":
   expected: 69e5c7f96d30e12e4707fb13637aceab
     actual: b0e641c998cc3eae6fa2f8726d98cddd

After repeating this bug over and over, we noticed that the 'actual'
checksum is *always the same*, on every run!

   actual: b0e641c998cc3eae6fa2f8726d98cddd

After placing debugging statements into reps-strings.c, we concluded
an amazing thing; this specific checksum is being stored in the rep,
even when the rep's data would generate completely different one.

If you apply the following patch to reps-strings.c, you can see it for
yourself. (Keep reading below the patch)

Index: subversion/libsvn_fs/reps-strings.c
===================================================================
--- subversion/libsvn_fs/reps-strings.c (revision 4860)
+++ subversion/libsvn_fs/reps-strings.c (working copy)
@@ -1044,6 +1044,7 @@
   struct rep_write_baton *wb = baton;
   unsigned char digest[MD5_DIGESTSIZE];
   svn_fs__representation_t *rep;
+ const char *hex;
 
   /* ### Thought: if we fixed apr-util MD5 contexts to allow repeated
      digestification, then we wouldn't need a stream close function at
@@ -1055,6 +1056,21 @@
 
   apr_md5_final (digest, &(wb->md5_context));
 
+ hex = svn_md5_digest_to_cstring (digest, trail->pool);
+
+ if (strcmp (hex, "b0e641c998cc3eae6fa2f8726d98cddd") == 0)
+ {
+ FILE *fp;
+ svn_string_t foostr;
+ svn_fs__rep_contents (&foostr, wb->fs, wb->rep_key, trail);
+ fp = fopen (apr_pstrcat (trail->pool, "./blarg-",
+ wb->rep_key, NULL),
+ "a");
+ fprintf (fp, foostr.data);
+ fclose (fp);
+ printf("Got THE checksum when writing rep... dumped rep_key to file.");
+ }
+
   SVN_ERR (svn_fs__bdb_read_rep (&rep, wb->fs, wb->rep_key, trail));
   memcpy (rep->checksum, digest, MD5_DIGESTSIZE);
   SVN_ERR (svn_fs__bdb_write_rep (wb->fs, wb->rep_key, rep, trail));

It's absolutely baffling. Here's our code, writing data into the
rep-key, accumulating an md5 checksum in the baton as it goes. In
txn_body_write_close_rep(), we call apr_md5_final() and then write the
checksum into the rep. But *sometimes*, for no apparent reason,
apr_md5_final returns

      b0e641c998cc3eae6fa2f8726d98cddd

... which gets stored, and then causes the mismatch error later on.

The patch above notices when the evil checksum appears, and dumps the
rep data to disk. Sure enough, running md5 on the dumped file yields
the correct checksum. So we know the correct data is going into the
key... yet our running checksum returns the Evil Checksum instead.

Where does this checksum come from? What makes it special? Why does
it pop out of apr_md5_final every once in a while?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Feb 12 22:37:48 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.