[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH] Make FSFS use both SHA1 and MD5 checksums

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Fri, 21 Nov 2008 10:48:59 -0600

Per Dave Glasser's concerns as noted in TODO-1.6, and following the example of
BDB in r33975, I've added the ability to store both SHA1 (for rep-sharing) and
MD5 (for backwards compat and integrity checking) to FSFS. The SHA1 stuff
should be conditional on the FS format number, but I plan on handling that in a
future commit.

The following patch passes all tests, except for svnadmin_tests 12. This test
is a test for failure, and with the patch it still fails, just in a different
way. I'm tempted to tweak the expected error output of the test, but want to
make sure that I'm not hiding more sinister problems with the patch first.

There's still a few rough edges to be fixed in followup commits, but could a
FSFS-minded person give this a look over to make sure things are right?

Thanks,
-Hyrum

[[[
Similar to r33975 with BDB, teach FSFS to store both MD5 and SHA1 checksums.

* subversion/libsvn_fs_fs/fs_fs.c
  (read_rep_offsets): Parse both sha1 and md5 checksums from the node rev.
  (representation_string): Put both the sha1 and md5 checksums, if they exist
    (put the all-zero checksum if they don't).
  (read_rep_baton): Rename context and checksum to reflect type.
  (rep_read_get_baton): Initialize baton contents.
  (rep_read_contents): Update the checksum with the appropriate contents.
  (svn_fs_fs__get_file_delta_stream): More baton variable rename fixes.
  (svn_fs_fs__file_checksum): Return the md5 checksum no matter the type
    requested.
  (svn_fs_fs__rep_copy): Update for new representation checksum members.
  (rep_write_baton): Add contexts for both sha1 and md5.
  (rep_write_contents): Update both contexts.
  (rep_write_get_baton): Initialize both contexts.
  (rep_write_contents_close): Finalize both checksums.
  (write_final_rev): Update for new checksum variable names.

* subversion/libsvn_fs_fs/fs.h
  (representation_t): Remove generic checksum member in favor of specifically
    named checksum member.

* subversoin/libsvn_fs_fs/rep-cache.c
  (svn_fs_fs__get_rep_reference): Make sure only sha1 references are requested.
  (svn_fs_fs__set_rep_reference): Make sure only sha1 references are inserted.
  (svn_fs_fs__inc_rep_reuse): Update variable names.
]]]

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c (revision 34308)
+++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
@@ -1644,16 +1644,24 @@ read_rep_offsets(representation_t **rep_p,
     return svn_error_create(SVN_ERR_FS_CORRUPT, NULL,
                             _("Malformed text rep offset line in node-rev"));
 
- SVN_ERR(svn_checksum_parse_hex(&rep->checksum, svn_checksum_md5, str, pool));
+ SVN_ERR(svn_checksum_parse_hex(&rep->md5_checksum, svn_checksum_md5, str,
+ pool));
 
- /* Read the reuse count. */
+ /* The remaining fields are only used for formats >= 4, so check that. */
   str = apr_strtok(NULL, " ", &last_str);
   if (str == NULL)
- {
- /* The reuse_count field is only in formats >= 4. */
- return SVN_NO_ERROR;
- }
+ return SVN_NO_ERROR;
 
+ /* Read the SHA1 hash. */
+ if (strlen(str) != (APR_SHA1_DIGESTSIZE * 2))
+ return svn_error_create(SVN_ERR_FS_CORRUPT, NULL,
+ _("Malformed text rep offset line in node-rev"));
+
+ SVN_ERR(svn_checksum_parse_hex(&rep->sha1_checksum, svn_checksum_sha1, str,
+ pool));
+
+ /* Read the reuse count. */
+ str = apr_strtok(NULL, " ", &last_str);
   rep->reuse_count = apr_atoi64(str);
 
   return SVN_NO_ERROR;
@@ -1870,11 +1878,17 @@ representation_string(representation_t *rep,
     return "-1";
   else
     return apr_psprintf(pool, "%ld %" APR_OFF_T_FMT " %" SVN_FILESIZE_T_FMT
- " %" SVN_FILESIZE_T_FMT " %s %" APR_INT64_T_FMT,
+ " %" SVN_FILESIZE_T_FMT " %s %s %" APR_INT64_T_FMT,
                         rep->revision, rep->offset, rep->size,
                         rep->expanded_size,
- svn_checksum_to_cstring_display(rep->checksum,
- pool),
+ rep->md5_checksum ?
+ svn_checksum_to_cstring_display(rep->md5_checksum,
+ pool) :
+ "00000000000000000000000000000000",
+ rep->sha1_checksum ?
+ svn_checksum_to_cstring_display(rep->sha1_checksum,
+ pool) :
+ "0000000000000000000000000000000000000000",
                         rep->reuse_count);
 }
 
@@ -2442,15 +2456,19 @@ struct rep_read_baton
   apr_size_t buf_pos;
   apr_size_t buf_len;
 
- /* A checksum context for summing the data read in order to verify it. */
- svn_checksum_ctx_t *checksum_ctx;
+ /* A checksum context for summing the data read in order to verify it.
+ Note: we don't need to use the sha1 checksum because we're only doing
+ data verification, for which md5 is perfectly safe. */
+ svn_checksum_ctx_t *md5_checksum_ctx;
+
   svn_boolean_t checksum_finalized;
 
   /* The stored checksum of the representation we are reading, its
      length, and the amount we've read so far. Some of this
      information is redundant with rs_list and src_state, but it's
      convenient for the checksumming code to have it here. */
- svn_checksum_t *checksum;
+ svn_checksum_t *md5_checksum;
+
   svn_filesize_t len;
   svn_filesize_t off;
 
@@ -2486,9 +2504,9 @@ rep_read_get_baton(struct rep_read_baton **rb_p,
   b->fs = fs;
   b->chunk_index = 0;
   b->buf = NULL;
- b->checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, pool);
+ b->md5_checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, pool);
   b->checksum_finalized = FALSE;
- b->checksum = svn_checksum_dup(rep->checksum, pool);
+ b->md5_checksum = svn_checksum_dup(rep->md5_checksum, pool);
   b->len = rep->expanded_size;
   b->off = 0;
   b->fulltext_cache_key = fulltext_cache_key;
@@ -2766,22 +2784,22 @@ rep_read_contents(void *baton,
      twice. */
   if (!rb->checksum_finalized)
     {
- SVN_ERR(svn_checksum_update(rb->checksum_ctx, buf, *len));
+ SVN_ERR(svn_checksum_update(rb->md5_checksum_ctx, buf, *len));
       rb->off += *len;
       if (rb->off == rb->len)
         {
- svn_checksum_t *checksum;
+ svn_checksum_t *md5_checksum;
 
           rb->checksum_finalized = TRUE;
- svn_checksum_final(&checksum, rb->checksum_ctx, rb->pool);
- if (!svn_checksum_match(checksum, rb->checksum))
+ svn_checksum_final(&md5_checksum, rb->md5_checksum_ctx, rb->pool);
+ if (!svn_checksum_match(md5_checksum, rb->md5_checksum))
             return svn_error_createf
               (SVN_ERR_FS_CORRUPT, NULL,
                _("Checksum mismatch while reading representation:\n"
                  " expected: %s\n"
                  " actual: %s\n"),
- svn_checksum_to_cstring_display(rb->checksum, rb->pool),
- svn_checksum_to_cstring_display(checksum, rb->pool));
+ svn_checksum_to_cstring_display(rb->md5_checksum, rb->pool),
+ svn_checksum_to_cstring_display(md5_checksum, rb->pool));
         }
     }
 
@@ -2932,7 +2950,8 @@ svn_fs_fs__get_file_delta_stream(svn_txdelta_strea
           /* Create the delta read baton. */
           struct delta_read_baton *drb = apr_pcalloc(pool, sizeof(*drb));
           drb->rs = rep_state;
- drb->checksum = svn_checksum_dup(target->data_rep->checksum, pool);
+ drb->checksum = svn_checksum_dup(target->data_rep->md5_checksum,
+ pool);
           *stream_p = svn_txdelta_stream_create(drb, delta_read_next_window,
                                                 delta_read_md5_digest, pool);
           return SVN_NO_ERROR;
@@ -3235,8 +3254,10 @@ svn_fs_fs__file_checksum(svn_checksum_t **checksum
                          node_revision_t *noderev,
                          apr_pool_t *pool)
 {
+ /* TODO: pass a kind argument to this function, to enable it to choose
+ between rep->md5_checksum and rep->sha1_checksum. */
   if (noderev->data_rep)
- *checksum = svn_checksum_dup(noderev->data_rep->checksum, pool);
+ *checksum = svn_checksum_dup(noderev->data_rep->md5_checksum, pool);
   else
     *checksum = NULL;
 
@@ -3255,7 +3276,8 @@ svn_fs_fs__rep_copy(representation_t *rep,
   rep_new = apr_pcalloc(pool, sizeof(*rep_new));
 
   memcpy(rep_new, rep, sizeof(*rep_new));
- rep_new->checksum = svn_checksum_dup(rep->checksum, pool);
+ rep_new->md5_checksum = svn_checksum_dup(rep->md5_checksum, pool);
+ rep_new->sha1_checksum = svn_checksum_dup(rep->sha1_checksum, pool);
 
   return rep_new;
 }
@@ -4404,7 +4426,8 @@ struct rep_write_baton
      writing to it. */
   void *lockcookie;
 
- svn_checksum_ctx_t *checksum_ctx;
+ svn_checksum_ctx_t *md5_checksum_ctx;
+ svn_checksum_ctx_t *sha1_checksum_ctx;
 
   apr_pool_t *pool;
 
@@ -4421,7 +4444,8 @@ rep_write_contents(void *baton,
 {
   struct rep_write_baton *b = baton;
 
- SVN_ERR(svn_checksum_update(b->checksum_ctx, data, *len));
+ SVN_ERR(svn_checksum_update(b->md5_checksum_ctx, data, *len));
+ SVN_ERR(svn_checksum_update(b->sha1_checksum_ctx, data, *len));
   b->rep_size += *len;
 
   /* If we are writing a delta, use that stream. */
@@ -4493,7 +4517,8 @@ rep_write_get_baton(struct rep_write_baton **wb_p,
 
   b = apr_pcalloc(pool, sizeof(*b));
 
- b->checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, pool);
+ b->sha1_checksum_ctx = svn_checksum_ctx_create(svn_checksum_sha1, pool);
+ b->md5_checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, pool);
 
   b->fs = fs;
   b->parent_pool = pool;
@@ -4576,12 +4601,15 @@ rep_write_contents_close(void *baton)
   rep->revision = SVN_INVALID_REVNUM;
 
   /* Finalize the checksum. */
- svn_checksum_final(&rep->checksum, b->checksum_ctx, b->parent_pool);
+ SVN_ERR(svn_checksum_final(&rep->md5_checksum, b->md5_checksum_ctx,
+ b->parent_pool));
+ SVN_ERR(svn_checksum_final(&rep->sha1_checksum, b->sha1_checksum_ctx,
+ b->parent_pool));
 
   /* Check and see if we already have a representation somewhere that's
      identical to the one we just wrote out. */
   if (ffd->format >= SVN_FS_FS__MIN_REP_SHARING_FORMAT)
- SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, b->fs, rep->checksum,
+ SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, b->fs, rep->sha1_checksum,
                                          b->parent_pool));
   else
     old_rep = NULL;
@@ -4881,7 +4909,7 @@ write_final_rev(const svn_fs_id_t **new_id_p,
           noderev->data_rep->revision = rev;
           SVN_ERR(get_file_offset(&noderev->data_rep->offset, file, pool));
           SVN_ERR(write_hash_rep(&noderev->data_rep->size,
- &noderev->data_rep->checksum, file,
+ &noderev->data_rep->md5_checksum, file,
                                  str_entries, pool));
           noderev->data_rep->expanded_size = noderev->data_rep->size;
         }
@@ -4907,7 +4935,7 @@ write_final_rev(const svn_fs_id_t **new_id_p,
       SVN_ERR(svn_fs_fs__get_proplist(&proplist, fs, noderev, pool));
       SVN_ERR(get_file_offset(&noderev->prop_rep->offset, file, pool));
       SVN_ERR(write_hash_rep(&noderev->prop_rep->size,
- &noderev->prop_rep->checksum, file,
+ &noderev->prop_rep->md5_checksum, file,
                              proplist, pool));
 
       noderev->prop_rep->txn_id = NULL;
Index: subversion/libsvn_fs_fs/fs.h
===================================================================
--- subversion/libsvn_fs_fs/fs.h (revision 34308)
+++ subversion/libsvn_fs_fs/fs.h (working copy)
@@ -252,14 +252,19 @@ typedef struct
  * svn_fs_fs__rep_copy. */
 typedef struct
 {
- /* Checksum for the contents produced by this representation.
+ /* Checksums for the contents produced by this representation.
      This checksum is for the contents the rep shows to consumers,
      regardless of how the rep stores the data under the hood. It is
      independent of the storage (fulltext, delta, whatever).
 
      If checksum is NULL, then for compatibility behave as though this
- checksum matches the expected checksum. */
- svn_checksum_t *checksum;
+ checksum matches the expected checksum.
+
+ The md5 checksum is always filled (with the exception being reading
+ from a repository which store checksums). The sha1 checksum is only
+ computed on a write, for use with rep-sharing, otherwise it is NULL. */
+ svn_checksum_t *md5_checksum;
+ svn_checksum_t *sha1_checksum;
 
   /* Revision where this representation is located. */
   svn_revnum_t revision;
Index: subversion/libsvn_fs_fs/rep-cache.c
===================================================================
--- subversion/libsvn_fs_fs/rep-cache.c (revision 34308)
+++ subversion/libsvn_fs_fs/rep-cache.c (working copy)
@@ -114,6 +114,12 @@ svn_fs_fs__get_rep_reference(representation_t **re
       return SVN_NO_ERROR;
     }
 
+ /* We only allow SHA1 checksums in this table. */
+ if (checksum->kind != svn_checksum_sha1)
+ return svn_error_create(SVN_ERR_BAD_CHECKSUM_KIND, NULL,
+ _("Only SHA1 checksums can be used as keys in the "
+ "rep_cache table.\n"));
+
   if (!ffd->rep_cache.get_rep_stmt)
     SVN_ERR(svn_sqlite__prepare(&ffd->rep_cache.get_rep_stmt, ffd->rep_cache.db,
                   "select revision, offset, size, expanded_size from rep_cache "
@@ -126,7 +132,7 @@ svn_fs_fs__get_rep_reference(representation_t **re
   if (have_row)
     {
       *rep = apr_pcalloc(pool, sizeof(**rep));
- (*rep)->checksum = svn_checksum_dup(checksum, pool);
+ (*rep)->sha1_checksum = svn_checksum_dup(checksum, pool);
       (*rep)->revision = svn_sqlite__column_revnum(ffd->rep_cache.get_rep_stmt,
                                                    0);
       (*rep)->offset = svn_sqlite__column_int(ffd->rep_cache.get_rep_stmt, 1);
@@ -153,11 +159,17 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
   if (ffd->rep_cache.db == NULL)
     return SVN_NO_ERROR;
 
- /* Check to see if we already have a mapping for REP->CHECKSUM. If so,
+ /* We only allow SHA1 checksums in this table. */
+ if (rep->sha1_checksum == NULL)
+ return svn_error_create(SVN_ERR_BAD_CHECKSUM_KIND, NULL,
+ _("Only SHA1 checksums can be used as keys in the "
+ "rep_cache table.\n"));
+
+ /* Check to see if we already have a mapping for REP->SHA1_CHECKSUM. If so,
      and the value is the same one we were about to write, that's
      cool -- just do nothing. If, however, the value is *different*,
      that's a red flag! */
- SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, fs, rep->checksum, pool));
+ SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, fs, rep->sha1_checksum, pool));
 
   if (old_rep)
     {
@@ -174,7 +186,7 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
                               APR_OFF_T_FMT, SVN_FILESIZE_T_FMT,
                               SVN_FILESIZE_T_FMT, APR_OFF_T_FMT,
                               SVN_FILESIZE_T_FMT, SVN_FILESIZE_T_FMT),
- svn_checksum_to_cstring_display(rep->checksum, pool),
+ svn_checksum_to_cstring_display(rep->sha1_checksum, pool),
                  fs->path, old_rep->revision, old_rep->offset, old_rep->size,
                  old_rep->expanded_size, rep->revision, rep->offset, rep->size,
                  rep->expanded_size);
@@ -189,7 +201,8 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
                   "values (:1, :2, :3, :4, :5, 0);", fs->pool));
 
   SVN_ERR(svn_sqlite__bind_text(ffd->rep_cache.set_rep_stmt, 1,
- svn_checksum_to_cstring(rep->checksum, pool)));
+ svn_checksum_to_cstring(rep->sha1_checksum,
+ pool)));
   SVN_ERR(svn_sqlite__bind_int64(ffd->rep_cache.set_rep_stmt, 2, rep->revision));
   SVN_ERR(svn_sqlite__bind_int64(ffd->rep_cache.set_rep_stmt, 3, rep->offset));
   SVN_ERR(svn_sqlite__bind_int64(ffd->rep_cache.set_rep_stmt, 4, rep->size));
@@ -219,13 +232,14 @@ svn_fs_fs__inc_rep_reuse(svn_fs_t *fs,
                   fs->pool));
 
   SVN_ERR(svn_sqlite__bind_text(ffd->rep_cache.inc_select_stmt, 1,
- svn_checksum_to_cstring(rep->checksum, pool)));
+ svn_checksum_to_cstring(rep->sha1_checksum,
+ pool)));
   SVN_ERR(svn_sqlite__step(&have_row, ffd->rep_cache.inc_select_stmt));
 
   if (!have_row)
     return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL,
                              _("Representation for hash '%s' not found"),
- svn_checksum_to_cstring_display(rep->checksum,
+ svn_checksum_to_cstring_display(rep->sha1_checksum,
                                                              pool));
 
   rep->reuse_count =
@@ -242,7 +256,8 @@ svn_fs_fs__inc_rep_reuse(svn_fs_t *fs,
   SVN_ERR(svn_sqlite__bind_int64(ffd->rep_cache.inc_update_stmt, 1,
                                  rep->reuse_count));
   SVN_ERR(svn_sqlite__bind_text(ffd->rep_cache.inc_update_stmt, 2,
- svn_checksum_to_cstring(rep->checksum, pool)));
+ svn_checksum_to_cstring(rep->sha1_checksum,
+ pool)));
   SVN_ERR(svn_sqlite__step_done(ffd->rep_cache.inc_update_stmt));
 
   return svn_sqlite__reset(ffd->rep_cache.inc_update_stmt);

Received on 2008-11-21 17:49:22 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.