OK, cmpilato and I have been chasing a checksum bug all day. I'm
going to summarize our findings to the list, in hopes that others may
have bright ideas. We've been working with HEAD code of course.
If you run two copies of stress.pl at the same time:
xterm1: stress.pl -c -s1
...wait 10 secs...
xterm2: stress.pl -s1
Then within 5 to 30 seconds, you'll get a commit error in the second
shell like this:
Transmitting file data ..subversion/libsvn_client/commit.c:669: (apr_err=200014)svn: A checksum mismatch occurred
svn: Commit failed (details follow):
subversion/libsvn_fs/dag.c:1391: (apr_err=200014)
svn: svn_fs__dag_finalize_edits: checksum mismatch, rep "1o":
expected: 69e5c7f96d30e12e4707fb13637aceab
actual: b0e641c998cc3eae6fa2f8726d98cddd
After repeating this bug over and over, we noticed that the 'actual'
checksum is *always the same*, on every run!
actual: b0e641c998cc3eae6fa2f8726d98cddd
After placing debugging statements into reps-strings.c, we concluded
an amazing thing; this specific checksum is being stored in the rep,
even when the rep's data would generate completely different one.
If you apply the following patch to reps-strings.c, you can see it for
yourself. (Keep reading below the patch)
Index: subversion/libsvn_fs/reps-strings.c
===================================================================
--- subversion/libsvn_fs/reps-strings.c (revision 4860)
+++ subversion/libsvn_fs/reps-strings.c (working copy)
@@ -1044,6 +1044,7 @@
struct rep_write_baton *wb = baton;
unsigned char digest[MD5_DIGESTSIZE];
svn_fs__representation_t *rep;
+ const char *hex;
/* ### Thought: if we fixed apr-util MD5 contexts to allow repeated
digestification, then we wouldn't need a stream close function at
@@ -1055,6 +1056,21 @@
apr_md5_final (digest, &(wb->md5_context));
+ hex = svn_md5_digest_to_cstring (digest, trail->pool);
+
+ if (strcmp (hex, "b0e641c998cc3eae6fa2f8726d98cddd") == 0)
+ {
+ FILE *fp;
+ svn_string_t foostr;
+ svn_fs__rep_contents (&foostr, wb->fs, wb->rep_key, trail);
+ fp = fopen (apr_pstrcat (trail->pool, "./blarg-",
+ wb->rep_key, NULL),
+ "a");
+ fprintf (fp, foostr.data);
+ fclose (fp);
+ printf("Got THE checksum when writing rep... dumped rep_key to file.");
+ }
+
SVN_ERR (svn_fs__bdb_read_rep (&rep, wb->fs, wb->rep_key, trail));
memcpy (rep->checksum, digest, MD5_DIGESTSIZE);
SVN_ERR (svn_fs__bdb_write_rep (wb->fs, wb->rep_key, rep, trail));
It's absolutely baffling. Here's our code, writing data into the
rep-key, accumulating an md5 checksum in the baton as it goes. In
txn_body_write_close_rep(), we call apr_md5_final() and then write the
checksum into the rep. But *sometimes*, for no apparent reason,
apr_md5_final returns
b0e641c998cc3eae6fa2f8726d98cddd
... which gets stored, and then causes the mismatch error later on.
The patch above notices when the evil checksum appears, and dumps the
rep data to disk. Sure enough, running md5 on the dumped file yields
the correct checksum. So we know the correct data is going into the
key... yet our running checksum returns the Evil Checksum instead.
Where does this checksum come from? What makes it special? Why does
it pop out of apr_md5_final every once in a while?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Feb 12 22:37:48 2003