Symmetry between dump and load
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 19 Dec 2014 12:23:11 +0000
I believe the following symmetries should be true, and testable, and we should test them.
For any valid repository:
* we can dump it
For any valid dump file:
* we can load it into a new repository
WHY?
This thought was triggered after noticing that we keep finding more and more asymmetries (that is, bugs) in dump and load. Most of the ones I have paid attention to are related to mergeinfo. Examples:
#3912 svnadmin load does fail to process dumps with non UTF-8 path names
Why does this matter? Users care about stability. Waiting for a bug to show up, fixing it, and adding a regression test for that particular case gets us only so far. We could be pro-active, and go looking for these sorts of bugs much more aggressively. I think we should.
Why should we declare that these symmetries hold? Because we defined dump and load to be the canonical (or "lowest common denominator") back-up mechanism: its whole purpose is to represent the content of a repository unambiguously and completely and transfer that content to a different repository. (Oops, it fails in the "completely" department: it doesn't represent locks, for one thing.) And because we rely on these symmetries in our understanding and maintenance of the software.
Why should these symmetries be so tight that they can be mechanically tested, without an unmanageable number of intentional differences? Because we can't produce solid software if we can't test it!
HOW?
The meanings of "valid" and "equivalent" will need to be defined carefully. Here are some starting points for definitions.
"valid repository":
* calling any libsvn_repos or higher level APIs, even with bad parameters and including calls that fail;
"valid dump file":
"equivalent repositories"
"equivalent dump files"
FUZZING
How can we possibly test all valid repositories and all valid dump files? Not by hand-crafted test cases, that's certain. However, the technique of repeatable, pseudo-random testing, aka "fuzzing", can enable us to approach closer and closer to complete test coverage, the more time we throw at it. Forget the idea that a test case has to have a predetermined coverage and has to run to completion every time we run "the tests". Instead, when run as part of the normal test suite, this "fuzzer" would generate a small number of test cases from pseudo-random inputs, and run them. These would be different each time it runs.
The "repeatable" part is that, whenever a generated test case fails, the parameters would be logged in a way that allows that specific case to be re-generated. Then it can be examined, re-tested against different builds, and, if it detected a real bug, inserted into the test suite as a separate, static regression test to be run every time.
The test code would also have a mode that tells it to keep generating and running pseudo-random test cases for a long or unlimited time.
OTHER SYMMETRIES
Subversion is quite rich in symmetries, more so than some other software because its job is to preserve data.
* svnrdump dump and load should be symmetrical. They should also be equivalent to svnadmin dump and load respectively, except as modified by RA layer constraints.
* svnsync should directly create an equivalent repository.
* Any query to a write-through proxy should return the same result as querying the master.
* Most of the Subversion library APIs have read and write interfaces which should be (broadly) symmetrical. Major ones include FSFS; FS; repos; delta; diff(+patch); RA; and to some extent WC.
* Many low-level two-way conversions should be symmetrical: reading/writing config files, parsing/unparsing mergeinfo.
* Getting more advanced... Any change or series of changes committed to 'trunk', we should be able to commit instead to a branch and then merge to trunk. If there were no changes (or no conflicting changes) made on trunk in the meantime, the end result should be identical.
* 'svn diff -rX:Y' and 'svn diff 'rY:X' should be mirror images.
* and many more!
Thoughts?
- Julian
|
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.