[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Make FSFS use both SHA1 and MD5 checksums

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Sat, 22 Nov 2008 13:45:45 -0600

David Glasser wrote:
> On Fri, Nov 21, 2008 at 8:48 AM, Hyrum K. Wright
> <hyrum_wright_at_mail.utexas.edu> wrote:
>> Per Dave Glasser's concerns as noted in TODO-1.6, and following the example of
>> BDB in r33975, I've added the ability to store both SHA1 (for rep-sharing) and
>> MD5 (for backwards compat and integrity checking) to FSFS. The SHA1 stuff
>> should be conditional on the FS format number, but I plan on handling that in a
>> future commit.
>>
>> The following patch passes all tests, except for svnadmin_tests 12. This test
>> is a test for failure, and with the patch it still fails, just in a different
>> way. I'm tempted to tweak the expected error output of the test, but want to
>> make sure that I'm not hiding more sinister problems with the patch first.
>>
>> There's still a few rough edges to be fixed in followup commits, but could a
>> FSFS-minded person give this a look over to make sure things are right?

>> Index: subversion/libsvn_fs_fs/fs_fs.c
>> ===================================================================
>> --- subversion/libsvn_fs_fs/fs_fs.c (revision 34308)
>> +++ subversion/libsvn_fs_fs/fs_fs.c (working copy)
>> @@ -1644,16 +1644,24 @@ read_rep_offsets(representation_t **rep_p,
>> return svn_error_create(SVN_ERR_FS_CORRUPT, NULL,
>> _("Malformed text rep offset line in node-rev"));
>>
>> - SVN_ERR(svn_checksum_parse_hex(&rep->checksum, svn_checksum_md5, str, pool));
>> + SVN_ERR(svn_checksum_parse_hex(&rep->md5_checksum, svn_checksum_md5, str,
>> + pool));
>>
>> - /* Read the reuse count. */
>> + /* The remaining fields are only used for formats >= 4, so check that. */
>
> Throw in a ### or TODO or something if you don't fix this first.

Will do.

>> str = apr_strtok(NULL, " ", &last_str);
>> if (str == NULL)
>> - {
>> - /* The reuse_count field is only in formats >= 4. */
>> - return SVN_NO_ERROR;
>> - }
>> + return SVN_NO_ERROR;
>>
>> + /* Read the SHA1 hash. */
>> + if (strlen(str) != (APR_SHA1_DIGESTSIZE * 2))
>
> It might be worth commenting that the *2 is because we're encoding as
> two chars per byte? Maybe that's obvious.

I think it's obvious, and we don't have such a comment for the md5, so I'm
leaving it out for now.

>> + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL,
>> + _("Malformed text rep offset line in node-rev"));
>> +
>> + SVN_ERR(svn_checksum_parse_hex(&rep->sha1_checksum, svn_checksum_sha1, str,
>> + pool));
>> +
>> + /* Read the reuse count. */
>> + str = apr_strtok(NULL, " ", &last_str);
>> rep->reuse_count = apr_atoi64(str);
>>
>> return SVN_NO_ERROR;
>> @@ -1870,11 +1878,17 @@ representation_string(representation_t *rep,
>> return "-1";
>> else
>> return apr_psprintf(pool, "%ld %" APR_OFF_T_FMT " %" SVN_FILESIZE_T_FMT
>> - " %" SVN_FILESIZE_T_FMT " %s %" APR_INT64_T_FMT,
>> + " %" SVN_FILESIZE_T_FMT " %s %s %" APR_INT64_T_FMT,
>> rep->revision, rep->offset, rep->size,
>> rep->expanded_size,
>> - svn_checksum_to_cstring_display(rep->checksum,
>> - pool),
>> + rep->md5_checksum ?
>> + svn_checksum_to_cstring_display(rep->md5_checksum,
>> + pool) :
>> + "00000000000000000000000000000000",
>
> Do these constants exist elsewhere in the source? Might be cleaner to
> have a #define for them. (Or just use something like "-", although
> that would probably mean more special-case code.)

I don't think so. We just talk about "the zero checksum" or something, but we
never enumerate it.

>> + rep->sha1_checksum ?
>> + svn_checksum_to_cstring_display(rep->sha1_checksum,
>> + pool) :
>> + "0000000000000000000000000000000000000000",
>> rep->reuse_count);
>> }
>>
...
>> Index: subversion/libsvn_fs_fs/fs.h
>> ===================================================================
>> --- subversion/libsvn_fs_fs/fs.h (revision 34308)
>> +++ subversion/libsvn_fs_fs/fs.h (working copy)
>> @@ -252,14 +252,19 @@ typedef struct
>> * svn_fs_fs__rep_copy. */
>> typedef struct
>> {
>> - /* Checksum for the contents produced by this representation.
>> + /* Checksums for the contents produced by this representation.
>> This checksum is for the contents the rep shows to consumers,
>> regardless of how the rep stores the data under the hood. It is
>> independent of the storage (fulltext, delta, whatever).
>>
>> If checksum is NULL, then for compatibility behave as though this
>> - checksum matches the expected checksum. */
>> - svn_checksum_t *checksum;
>> + checksum matches the expected checksum.
>> +
>> + The md5 checksum is always filled (with the exception being reading
>> + from a repository which store checksums). The sha1 checksum is only
>
> I don't understand the parenthetical here.
>
>> + computed on a write, for use with rep-sharing, otherwise it is NULL. */
>
> second comma should be semi-colon.
>
> This paragraph is generally confusing; I think you need to
> differentiate between when the svn_checksum_t's exist (vs NULL), and
> when they are *computed*. Aren't the sha1_checksums non-NULL during
> reads if the sha1 is written in the rep?

I updated the comments in the committed patch.

>> + svn_checksum_t *md5_checksum;
>> + svn_checksum_t *sha1_checksum;
>>
>> /* Revision where this representation is located. */
>> svn_revnum_t revision;
...
>
> Generally looks reasonable to me.

Thanks for taking a look. I've committed the patch in r34338, and plan on doing
a few followups shortly.

-Hyrum

Received on 2008-11-22 20:46:02 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.