"I don't need to test my code, I know it works."It is safe to say that the idea that developers do not introduce bugs has been disproved.
Developers need a test suite to help with:
Fixing Bugs:
Each time a bug is fixed, a test case should be
added to the test suite. Creating a test case
that reproduces a bug is a seemingly obvious
requirement. If a bug cannot be reproduced,
there is no way to be sure a given change
will actually fix the problem. Once a
test case has been created, it can be used
to validate the correctness of a given patch.
Adding a new test case for each bug also
ensures that the same bug will not be
introduced again in the future.
Impact Analysis:
A developer fixing a bug or adding
a new feature needs to know if
a given change breaks other parts
of the code. It may seem obvious,
but keeping a developer from
introducing new bugs is one
of the primary benefits of
a using a regression test
system.
Regression Analysis:
When a test regression occurs,
a developer will need to manually
determine what has caused the failure.
The test system is not able
to determine why a test case
failed. The test system should
simply report exactly which test
results changed and when the
last results were generated.
Building:
Building software can be a scary process.
Users that have never built software
may be unwilling to try. Others may
have tried to build a piece of software
in the past, only to be thwarted by
a difficult build process. Even if
the build completed without an error,
how can a user be confident that the
generated executable actually works?
The only workable solution to this
problem is to provide an easily
accessible set of tests that the
user can run after building.
Porting:
Often, users become porters when
the need to run on a previously
unsupported system arises. This
porting process typically require
some minor tweaking of include files.
It is absolutely critical that
testing be available when porting
since the primary developers
may not have any way to test
changes submitted by someone
doing a port.
Testing:
Different installations
of the exact same OS can
contain subtle differences
that cause software to
operate incorrectly.
Only testing on different
systems will expose problems
of this nature. A test suite
can help identify these sorts
of problems before a program
is actually put to use.
Unique Test Identifiers:
Each test case must have a globally
unique test identifier, this identifier
is just a string. A globally unique
string is required so that test cases
can be individually identified by name,
sorted, and even looked up on the web.
It seems simple, perhaps even blatantly
obvious, but some other test packages
have failed to maintain uniqueness in
test identifiers and developers have
suffered because of it. It is even
desirable for the system actively
enforces this uniqueness requirement.
Exact Results:
A test case must have one expected
result. If the result of running the
tests does not exactly match the
expected result, the test must fail.
Reproducible Results:
Test results should be reproducible.
If a test result matches the expected
result, it should do so every time
the test is run. External
factors like time stamps must
not effect the results of a test.
Self-Contained Tests:
Each test should be self-contained.
Results for one test should not
depend on side effects of previous
tests. This is obviously a good
practice, since one is able to
understand everything a test is
doing without having to look
at other tests. The test system
should also support random access
so that a single test or set of
tests can be run. If a test is not
self-contained, it cannot be run
in isolation.
Selective Execution:
It may not be possible to run
a given set of tests on certain
systems. The suite must provide
a means of selectively running
tests cases based on the
environment. The test system
must also provide a way to
selectively run a given
test case or set of test
cases on a per invocation
basis. It would be incredibly
tedious to run the entire
suite to see the results
for a single test.
No Monitoring:
The tests must run from start to
end without operator intervention.
Test results must be generated
automatically. It is critical
that an operator not need to
manually compare test results
to figure out which tests failed
and which ones passed.
Automatic Logging of Results:
The system must store test
results so that they can be
compared later. This applies
to machine readable results
as well as human readable
results. For example, assume
we have a test named
client-1
, it expects
a result of 1 but instead 0
is returned by the test case.
We should expect the system to
store two distinct pieces of
information. First,
that the test failed. Second,
how the test failed, meaning
how the expected result
differed from the actual result.
This following example shows the kind of results we might record in a results log file.
client-1 FAILED
client-2 PASSED
client-3 PASSED
Automatic Recovery:
The test system must be able to recover
from crashes and unexpected delays.
For example, a child process might
go into a infinite loop and would
need to be killed. The test shell
itself might also crash or go into
an infinite loop. In these cases,
the test run must automatically
recover and continue with the tests
directly after the one that crashed.
This is critical for a couple of reasons. Nasty crashes and infinite loops most often appear on users (not developers) systems. Users are not well equipped to deal with these sorts of exceptional situations. It is unrealistic to expect that users will be able to manually recover from disaster and restart crashed test cases. It is an accomplishment just to get them to run the tests in the first place!
Ensuring that the test system actually runs each and every test is critical, since a failing test near the end of the suite might never be noticed if a crash halfway through kept all the tests from being run. This process must be completely automated, no operator intervention should be required.
Report Results Only:
When a regression is found, a developer
will need to manually determine the reason
for the regression.
The system should tell the developer exactly what
tests have failed, when the last set of
results were generated, and what the previous
results actually were.
Any additional functionality is outside the
scope of the test system.
Platform Specific Results:
Each supported platform should
have an associated set of
test results. The naive
approach would be to maintain
a single set of results and
compare the output for any platform
to the known results. The problem
with this approach is that is does
not provide a way to keep track
of when changes differ from one
platform to another. The following
example attempts to clarify
with an example.
Assume you have the following tests results generated on a reference platform before and after a set of changes were committed.
Before (Reference Platform) | After (Reference Platform) |
client-1 PASSED |
client-1 PASSED |
client-2 PASSED |
client-2 FAILED |
It is clear that the change you made introduced
a regression in the client-2
test.
The problem shows up when you try to compare
results generated from this modified code on
some other platform. For example, assume
you got the following results:
Before (Reference Platform) | After (Other Platform) |
client-1 PASSED |
client-1 FAILED |
client-2 PASSED |
client-2 PASSED |
Now things are not at all clear. We know that
client-1
is failing but we don't
know if it is related to the change we just
made. We don't know if this test failed the
last time we ran the tests on this platform
since we only have results for the reference
platform to compare to. We might have fixed
a bug in client-2
, or we might
have done nothing to effect it.
If we instead keep track of test results on a platform by platform basis, we can avoid much of this pain. It is easy to imagine how this problem could get considerably worse if there were 50 or 100 tests that behaved differently from one platform to the next.
Test Types:
The test suite should support two
types of tests. The first makes
use of an external program
like the svn client.
These kinds of tests will need
to exec an external program and
check the output and exit status
of the child process. Note that
it will not be possible to run
this sort of test on Mac OS.
The second type of test will
load subversion shared libraries
and invoke methods in-process.
This provides the ability to do extensive testing of the various subversion APIs without using the svn client. This also has the nice benefit that it will work on Mac OS, as well as Windows and Unix.
Developers will tend to avoid using a test suite if it is not easy to add new tests and maintain old ones. If developers are uninterested in using the test suite, it will quickly fall into disrepair and become a burden instead of an aide.
Users will simply avoid running the test suite if it is not extremely simple to use. A user should be able to build the software and then run:
% make check
This should run the test suite
and provide a very high level
set of results that include
how many tests results have
changed since the last run.
While this high level report is useful to developers, they will often need to examine results in more detail. The system should provide a means to manually examine results, compare output, invoke a debugger, and other sorts of low level operations.
The next example shows how a developer might run a specific subset of tests from the command line. The pattern given would be used to do a glob style match on the test case identifiers, and run any that matched.
% svntest "client-*"
The test suite should be packaged along with the source code instead of being made available as a separate download. This significantly simplifies the process of running tests since since they are already incorporated into the build tree.
The test suite must support building and running inside and outside of the source directory. For example, a developer might want to run tests on both Solaris and Linux. The developer should be able to run the tests concurrently in two different build directories without having the tests interfere with each other.
As much as possible, the test suite should avoid depending on external programs or libraries. Of course, there is a nasty bootstrap problem with a test suite implemented in a scripting language. A wide variety of systems provide no support for modern scripting languages. We will avoid this issue for now and assume that the scripting language of choice is supported by the system.
For example, the test suite should not depend on CVS to generate test results. Many users will not have access to CVS on the system they want to test subversion on.