[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: checksum / compare_files question

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Wed, 20 May 2009 00:12:52 +0200

John Szakmeister wrote:
> On Tue, May 19, 2009 at 7:59 AM, Stefan Sperling <stsp_at_elego.de> wrote:
> [snip]
>> I'd say a better representation of a wildcard checksum is in order.
>> The API currently overloads the meaning of NULL, which I don't think
>> is a good idea. Whenever I see a NULL pointer, I associate that with
>> "uninitialized" and hence "not OK to use".
>
> Yeah, I'm not typically a fan of that either, although I have to cope
> with that sort behavior quite often (mainly from 3rd party libraries)
> so it doesn't bother me much.
>
>> I don't really care if it's done by setting pointers to NULL,
>> but then we should at least
>>
>> #define SVN_WILDCARD_CHECKSUM NULL
>>
>> and have the code say:
>>
>> if (checksum1 == SVN_WILDCARD_CHECKSUM ||
>> checksum2 == SVN_WILDCARD_CHECKSUM)
>> return TRUE;
>>
>> Or going even further, we could add a "wildchard" checksum type.
>> Something like this:
>
>> Index: subversion/include/svn_checksum.h
>> ===================================================================
>> --- subversion/include/svn_checksum.h (revision 37767)
>> +++ subversion/include/svn_checksum.h (working copy)
>> @@ -43,7 +43,11 @@ typedef enum
>> svn_checksum_md5,
>>
>> /** The checksum is (or should be set to) a SHA1 checksum. */
>> - svn_checksum_sha1
>> + svn_checksum_sha1,
>> +
>> + /** The checksum is a wildcard.
>> + * @since New in 1.7 */
>> + svn_checksum_wildcard
>> } svn_checksum_kind_t;
>>
>
> [snip]
>> svn_checksum_parse_hex() would need a bump of course, but the diff
>> above shows just the docstring diff on purpose.
>>
>> Either approach would make the API more clear I think.
>
> Explicit is better than implicit. :-) FWIW, I think both ideas are
> great. I kind of like the second suggestion more, since it officially
> makes the wildcard it's own "type".

I'd agree, but have one problem remaining: is "the all-zeros wildcard"
applicable to both SHA1 and MD5 checksums? Futhermore, is it then desired
that an MD5 wildcard matches a SHA1 checksum (and vice versa)?

The current code returns true for a wildcard checksum, even if the kinds
were originally different. Say if I give it an MD5 all-zeros and check it
against any SHA1 checksum, I still get a match.

To stay out of that kind of trouble, I'd suggest something like this:

[[[
const svn_checksum_t svn_checksum_wildcard_md5 =
    { "00000000000000000000000000000000", svn_checksum_md5 };
#define SVN_CHECKSUM_WILDCARD_MD5 (&svn_checksum_wildcard_md5)

const svn_checksum_t svn_checksum_wildcard_sha1 =
    { "00000000 00000000 00000000 00000000 00000000", svn_checksum_sha1 };

#define SVN_CHECKSUM_WILDCARD_SHA1 (&svn_checksum_wildcard_sha1)
]]]

Wherever the current code replaces an all-zero checksum with a NULL
svn_checksum_t*, do this instead:
[[[
  if (checksum.kind == svn_checksum_md5 &&
      strcmp(checksum->digest, SVN_CHECKSUM_WILDCARD_MD5->digest) == 0)
    checksum = SVN_CHECKSUM_WILDCARD_MD5; // replace with constant
]]]
(And same for sha1)

Then, use these constants, but also reverse the order of the checks, i.e.
make the "kind" check before the wildcard check. Now, NULL checksums mean
"this is undefined".

[[[
svn_boolean_t
svn_checksum_match(const svn_checksum_t *checksum1,
                   const svn_checksum_t *checksum2)
{
  assert(checksum1 != NULL && checksum2 != NULL);

  if (checksum1->kind != checksum2->kind)
    return FALSE;

  // No need to explicitly check the kind here. Only checking for
  // identity with wildcard constants:
  if (checksum1 == SVN_CHECKSUM_WILDCARD_MD5 ||
      checksum2 == SVN_CHECKSUM_WILDCARD_MD5 ||
      checksum1 == SVN_CHECKSUM_WILDCARD_SHA1 ||
      checksum2 == SVN_CHECKSUM_WILDCARD_SHA1)
    return TRUE;
...
]]]

This way the wildcard checksum constants are literal structs -- a fixed and
reserved address in mem which, if anyone chose to print it to stdout, even
produces sensible output and doesn't crash anything.

Also, the currently overlapping meanings of "is not defined" vs. "is all
zeros" are separated.

Plus, the checksum kind is not lost by marking it a wildcard. It remains
clear whether this is a MD5 or SHA1 wildcard.

But, of course, this is quite some code bloat. The question remaining is
whether it is good to amount the absence of a checksum to a wildcard
checksum, and whether it is good to match a wildcard checksum of one kind to
a checksum of any other kind. If these are both good, we should merely
expand the comment in subversion/include/checksum.h to explain all of this
in detail. If either one of these is no good, I'd rather go all the way, as
in my example.

~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2320668

Received on 2009-05-20 00:13:48 CEST

This is an archived mail posted to the Subversion Dev mailing list.