svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5

From: Julian Foad <julianfoad_at_gmail.com>
Date: Fri, 25 Sep 2015 19:01:50 +0100

(New thread, taken from an observation in the "No-op changes no longer
dumped..." thread. In that thread, I hadn't spotted that it is only
the props comparison.)

Starting from 1.9.0, the FS API content comparison methods

    svn_fs_contents_changed()
    svn_fs_contents_different()
    svn_fs_props_changed()
    svn_fs_props_different()

are implemented in FSFS by svn_fs_fs__dag_things_different() which calls

    svn_fs_fs__prop_rep_equal(strict=TRUE/FALSE)
    and/or
    svn_fs_fs__file_text_rep_equal(strict=TRUE/FALSE)

* svn_fs_fs__file_text_rep_equal() uses MD5 checksum as a quick check,
returns 'not equal' if MD5s do not match, else gets a definitive
answer by comparing SHA1s (if present) or full text [1]. That's fine.

* svn_fs_fs__prop_rep_equal(), by contrast, reports that two
properties-reps are equal if their MD5 checksums are equal. [2]

These functions were introduced in
http://svn.apache.org/viewvc?view=revision&revision=1572336. The log
message indicates relying on MD5 equality for properties was
intentional, but is it consistent with the general guarantees we want
from Subversion? I think we generally want to rely on at least SHA1
equality, and I think we should apply that rule for all data.

If we change this to return early only if MD5s differ, else fall
through to a full check if MD5s match, the single run-time cost of
falling through is not severe (and is a hit that we take anyway
whenever one set of props is in a txn rather than a revision), and the
frequency of hitting this code path would be near zero -- only cases
where there is an MD5 collision. So I think not relying on MD5
equality here is a definite improvement.

In terms of severity, I think we should backport "don't rely on MD5
equality" to 1.9.x, but with no special urgency.

FSFS and FSX are affected equally; BDB is unaffected, as its code in
txn_body_props_changed() doesn't use the MD5 shortcut.

On looking further, I found more concerns with
svn_fs_fs__prop_rep_equal(). In non-strict mode it behaves quite
differently for (rev:rev) comparison than for a (rev:txn) or (txn:txn)
comparison. I don't like this; I think that is likely to lead to
problems.

The four comparison methods in the FS API are tested by
tests/libsvn_fs/fs-test[.c] 48.

* This test only tests (rev:rev) comparisons.
* This test looks includes a case where text and a property are set
to two versions of 'evil' text thay yield the same MD5 sum. I
initially thought this case would ensure test coverage for the unusual
(MD5 collision) case for all functions under test, but it does not.
The 'evil' text forms only a substring of the property-list rep, and
the MD5 sums of the full reps no longer collide.

Also it's confusing because the test sub-case 'evil text' (i=3)
doesn't actually end up executing the code path for finding identical
MD5s of the reps, because the special text is just the value of one
prop and not the complete rep of the properties list.

I propose the attached patch, for a start. It doesn't do anything to
resolve the change (regression) of the exact behaviour of non-strict
mode comparisons, which is the subject of another thread. It only
addresses the issues listed in this email.

Thoughts?

- Julian

[1] An extract from svn_fs_fs__file_text_rep_equal():
http://svn.apache.org/viewvc/subversion/tags/1.9.2/subversion/libsvn_fs_fs/fs_fs.c?view=markup#l1417

  /* File text representations always know their checksums - even in a txn. */
  if (memcmp(rep_a->md5_digest, rep_b->md5_digest, sizeof(rep_a->md5_digest)))
    {
      *equal = FALSE;
      return SVN_NO_ERROR;
    }

  /* Paranoia. Compare SHA1 checksums because that's the level of
     confidence we require for e.g. the working copy. */
  if (rep_a->has_sha1 && rep_b->has_sha1)
    {
      *equal = memcmp(rep_a->sha1_digest, rep_b->sha1_digest,
                      sizeof(rep_a->sha1_digest)) == 0;
      return SVN_NO_ERROR;
    }

[2] An extract from svn_fs_fs__prop_rep_equal():
http://svn.apache.org/viewvc/subversion/tags/1.9.2/subversion/libsvn_fs_fs/fs_fs.c?view=markup#l1490

  /* Committed property lists can be compared quickly */
  if ( rep_a && rep_b
      && !svn_fs_fs__id_txn_used(&rep_a->txn_id)
      && !svn_fs_fs__id_txn_used(&rep_b->txn_id))
    {
      /* MD5 must be given. Having the same checksum is good enough for
         accepting the prop lists as equal. */
      *equal = memcmp(rep_a->md5_digest, rep_b->md5_digest,
                      sizeof(rep_a->md5_digest)) == 0;
      return SVN_NO_ERROR;
    }

text/x-diff attachment: md5-for-props-comparison-2.patch

Received on 2015-09-25 20:02:17 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5"
Previous message: Julian Foad: "Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9"
Next in thread: Julian Foad: "Re: svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5"
Reply: Julian Foad: "Re: svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5"
Reply: Julian Foad: "Re: svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5"
Reply: Stefan Fuhrmann: "Re: svn_fs_fs__prop_rep_equal: non-strict mode, consistency, and MD5"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]