SVN test suite requirements (was CVS update ...)

From: Mo DeJong <mdejong_at_cygnus.com>
Date: 2001-03-28 07:38:07 CEST

I went ahead and changed the subject for this
thread since it was getting overloaded.

Also note that I have attached an updated
goals.html file to this email.

On Tue, 27 Mar 2001, Greg Stein wrote:

> > Can folks take a look at these requirements and suggest
> > any revisions or clarifications you think might be needed?
>
> I like it, but for a couple nits:

> *) automatic recovery is kind of a "3rd generation" type of thing for a test
> harness. that can get difficult. not saying it shouldn't be there, but
> our expectations should be set property to "probably not for a while"

Funny, I was thinking this was needed sooner rather than later.
I updated the Automatic Recovery: section to try to explain
why this is needed a bit better. Does it help?

> *) the "Platform Specific Results" section is very unclear, and I think not
> entirely correct/appropriate. Specifically, I would think that we would
> NOT have platform-specific results -- that all platforms should end up
> the same. if there *are* differences, then CVS should have separate
> copies of each platform's expected output (where that differs from the
> "portable, expected output")

I think my description was less than clear about which "results"
I was talking about (the test results compared to the PASSED/FAILED
results one gets by comparing a test's return value to the expected
result). I have added a much more detailed example to the
"Platform Specific Results" section in hopes of addressing this.

> *) you imply that a set of information is saved from "the last run". I'm not
> sure that I buy into saving info about the last run. Essentially, I think
> each test suite run should be stateless. But if we *do* have a need for
> retaining state, then the doc might want to discuss where that is
> properly kept.

Well, yes. Each run should be stateless. I am only talking
about comparing the final results from one run to the next.
Instead of just implying, I added an "Automatic Logging of Results:"
section to try to deal with this. Does that clear things up?
I also added a section about why test named need to be unique
and how developers would interface the svntest driver.

> *) what's with the 40 column formatting? do you use really huge fonts or
> something, and so you have fewer columns on your screen? I'm finding it
> is actually harder to read the text when there is a newline every five
> words or so. (in the past, I've actually reformatted your emails before
> reading them)

I am just strange :)

> I'd like to see your next draft checked into source control. I think it is
> very well done, overall.

I have my own CVS repo for the svntest code I am working on right
now. I guess I would rather just merge the whole directory in
later, if possible. If folks like my test harness, I would
like to get it added in subdirectories of a new
subversion/subversion/svntest directory. Of course, that
is still a number of weeks off (regular job, you know).
My hope is to have something I can demo at the SVLUG meeting.

<html>
<title>SVN Test</title>

<body bgcolor="white">

<h1>Design goals for the SVN test suite</h1>

<ul>
<li>
<A HREF="#WHY">Why Test?</A>
</li>
<li>
<A HREF="#AUDIENCE">Audience</A>
</li>
<li>
<A HREF="#REQUIREMENTS">Requirements</A>
</li>
<li>
<A HREF="#EASEOFUSE">Ease of Use</A>
</li>
<li>
<A HREF="#LOCATION">Location</A>
</li>
<li>
<A HREF="#EXTERNAL">External dependencies</A>
</li>
</ul>

<A NAME="WHY"><H3>Why Test?</H3></A>

Regression testing is an essential
element of high quality software.
Unfortunately, some developers
have not had first hand exposure
to a high quality testing framework.
Lack of familiarity with the positive
effects of testing can be blamed
for statements like:
 
<blockquote>
"I don't need to test my code,
I know it works."
</blockquote>
It is safe to say that the
idea that developers do not
introduce bugs
has been disproved.


<A NAME="AUDIENCE"><H3>Audience</H3></A>

The test suite will be used by
both developers and end users.


Developers need a test suite to help with:



Fixing Bugs: 
Each time a bug is fixed, a test case should be
added to the test suite. Creating a test case
that reproduces a bug is a seemingly obvious
requirement. If a bug cannot be reproduced,
there is no way to be sure a given change
will actually fix the problem. Once a
test case has been created, it can be used
to validate the correctness of a given patch.
Adding a new test case for each bug also
ensures that the same bug will not be
introduced again in the future.



Impact Analysis: 
A developer fixing a bug or adding
a new feature needs to know if
a given change breaks other parts
of the code. It may seem obvious,
but keeping a developer from
introducing new bugs is one
of the primary benefits of
a using a regression test
system.



Regression Analysis: 
When a test regression occurs,
a developer will need to manually
determine what has caused the failure.
The test system is not able
to determine why a test case
failed. The test system should
simply report exactly which test
results changed and when the
last results were generated.


Users need a test suite to help with:


Building: 
Building software can be a scary process.
Users that have never built software
may be unwilling to try. Others may
have tried to build a piece of software
in the past, only to be thwarted by
a difficult build process. Even if
the build completed without an error,
how can a user be confident that the
generated executable actually works?
The only workable solution to this
problem is to provide an easily
accessible set of tests that the
user can run after building.



Porting: 
Often, users become porters when
the need to run on a previously
unsupported system arises. This
porting process typically require
some minor tweaking of include files.
It is absolutely critical that
testing be available when porting
since the primary developers
may not have any way to test
changes submitted by someone
doing a port.



Testing: 
Different installations
of the exact same OS can
contain subtle differences
that cause software to
operate incorrectly.
Only testing on different
systems will expose problems
of this nature. A test suite
can help identify these sorts
of problems before a program
is actually put to use.


<A NAME="REQUIREMENTS"><H3>Requirements</H3></A>

Functional requirements of
an acceptable test suite include:


Unique Test Identifiers: 
 Each test case must have a globally
 unique test identifier, this identifier
 is just a string. A globally unique
 string is required so that test cases
 can be individually identified by name,
 sorted, and even looked up on the web.
 It seems simple, perhaps even blatantly
 obvious, but some other test packages
 have failed to maintain uniqueness in
 test identifiers and developers have
 suffered because of it. It is even
 desirable for the system actively
 enforces this uniqueness requirement.



Exact Results: 
 A test case must have one expected
 result. If the result of running the
 tests does not exactly match the
 expected result, the test must fail.



Reproducible Results: 
 Test results should be reproducible.
 If a test result matches the expected
 result, it should do so every time
 the test is run. External
 factors like time stamps must
 not effect the results of a test.



Self-Contained Tests: 
 Each test should be self-contained.
 Results for one test should not
 depend on side effects of previous
 tests. This is obviously a good
 practice, since one is able to
 understand everything a test is
 doing without having to look
 at other tests. The test system
 should also support random access
 so that a single test or set of
 tests can be run. If a test is not
 self-contained, it cannot be run
 in isolation.



Selective Execution: 
 It may not be possible to run
 a given set of tests on certain
 systems. The suite must provide
 a means of selectively running
 tests cases based on the
 environment. The test system
 must also provide a way to
 selectively run a given
 test case or set of test
 cases on a per invocation
 basis. It would be incredibly
 tedious to run the entire
 suite to see the results
 for a single test.



No Monitoring: 
 The tests must run from start to
 end without operator intervention.
 Test results must be generated
 automatically. It is critical
 that an operator not need to
 manually compare test results
 to figure out which tests failed
 and which ones passed.



Automatic Logging of Results: 
 The system must store test
 results so that they can be
 compared later. This applies
 to machine readable results
 as well as human readable
 results. For example, assume
 we have a test named
 <code>client-1</code>, it expects
 a result of 1 but instead 0
 is returned by the test case.
 We should expect the system to
 store two distinct pieces of
 information. First,
 that the test failed. Second,
 how the test failed, meaning
 how the expected result
 differed from the actual result.



 This following example shows
 the kind of results we might
 record in a results log file.



 <code><pre>
 client-1 FAILED
 client-2 PASSED
 client-3 PASSED
 </pre></code>



Automatic Recovery: 
 The test system must be able to recover
 from crashes and unexpected delays.
 For example, a child process might
 go into a infinite loop and would
 need to be killed. The test shell
 itself might also crash or go into
 an infinite loop. In these cases,
 the test run must automatically
 recover and continue with the tests
 directly after the one that crashed.



 This is critical for a couple of
 reasons. Nasty crashes and infinite
 loops most often appear on users
 (not developers) systems. Users
 are not well equipped to deal with
 these sorts of exceptional situations.
 It is unrealistic to expect that
 users will be able to manually
 recover from disaster and restart
 crashed test cases. It is an
 accomplishment just to get them
 to run the tests in the first
 place!



 Ensuring that the test system
 actually runs each and every
 test is critical, since a failing
 test near the end of the suite
 might never be noticed if a
 crash halfway through kept
 all the tests from being run.
 This process must be completely
 automated, no operator intervention
 should be required.



Report Results Only: 
 When a regression is found, a developer
 will need to manually determine the reason
 for the regression.
 The system should tell the developer exactly what
 tests have failed, when the last set of
 results were generated, and what the previous
 results actually were.
 Any additional functionality is outside the
 scope of the test system.



Platform Specific Results: 
 Each supported platform should
 have an associated set of
 test results. The naive
 approach would be to maintain
 a single set of results and
 compare the output for any platform
 to the known results. The problem
 with this approach is that is does
 not provide a way to keep track
 of when changes differ from one
 platform to another. The following
 example attempts to clarify
 with an example.

 
 Assume you have the following
 tests results generated on a
 reference platform before
 and after a set of changes
 were committed.
 

<table BORDER=1 COLS=2>

<tr>
<td>Before (Reference Platform)</td>

<td>After (Reference Platform)</td>
</tr>

<tr>
<td><code>client-1 PASSED</code></td>
<td><code>client-1 PASSED</code></td>
</tr>

<tr>
<td><code>client-2 PASSED</code></td>
<td><code>client-2 FAILED</code></td>
</tr>

</table>

 
 It is clear that the change you made introduced
 a regression in the <code>client-2</code> test.
 The problem shows up when you try to compare
 results generated from this modified code on
 some other platform. For example, assume
 you got the following results:
 

<table BORDER=1 COLS=2>

<tr>
<td>Before (Reference Platform)</td>

<td>After (Other Platform)</td>
</tr>

<tr>
<td><code>client-1 PASSED</code></td>
<td><code>client-1 FAILED</code></td>
</tr>

<tr>
<td><code>client-2 PASSED</code></td>
<td><code>client-2 PASSED</code></td>
</tr>

</table>

 
 Now things are not at all clear. We know that
 <code>client-1</code> is failing but we don't
 know if it is related to the change we just
 made. We don't know if this test failed the
 last time we ran the tests on this platform
 since we only have results for the reference
 platform to compare to. We might have fixed
 a bug in <code>client-2</code>, or we might
 have done nothing to effect it.
 

 
 If we instead keep track of test results
 on a platform by platform basis, we can
 avoid much of this pain. It is easy to
 imagine how this problem could get
 considerably worse if there were
 50 or 100 tests that behaved differently
 from one platform to the next.
 


Test Types: 
 The test suite should support two
 types of tests. The first makes
 use of an external program
 like the svn client.
 These kinds of tests will need
 to exec an external program and
 check the output and exit status
 of the child process. Note that
 it will not be possible to run
 this sort of test on Mac OS.
 The second type of test will
 load subversion shared libraries
 and invoke methods in-process.



 This provides the ability to
 do extensive testing of the
 various subversion APIs without
 using the svn client. This also
 has the nice benefit that it
 will work on Mac OS, as well
 as Windows and Unix.


<A NAME="EASEOFUSE"><H3>Ease of Use</H3></A>


Developers will tend to avoid using
a test suite if it is not easy to
add new tests and maintain old ones.
If developers are uninterested in
using the test suite, it will
quickly fall into disrepair
and become a burden instead of
an aide.



Users will simply avoid running
the test suite if it is not
extremely simple to use. A
user should be able to build
the software and then run:

<blockquote>
<code>
% make check
</code>
</blockquote>

This should run the test suite
and provide a very high level
set of results that include
how many tests results have
changed since the last run.



While this high level report
is useful to developers, they
will often need to examine
results in more detail.
The system should provide a
means to manually examine
results, compare output,
invoke a debugger, and
other sorts of low level
operations.



The next example shows how
a developer might run a
specific subset of tests
from the command line. The
pattern given would be used
to do a glob style match on
the test case identifiers,
and run any that matched.


<blockquote>
<code>
% svntest "client-*"
</code>
</blockquote>

<A NAME="LOCATION"><H3>Location</H3></A>


The test suite should be packaged
along with the source code instead
of being made available as a separate
download. This significantly
simplifies the process of running
tests since since they are
already incorporated into
the build tree.



The test suite must support
building and running inside
and outside of the source
directory. For example,
a developer might want to
run tests on both Solaris
and Linux. The developer
should be able to run the
tests concurrently in two
different build directories
without having the tests
interfere with each other.


<A NAME="EXTERNAL"><H3>External program dependencies</H3></A>


As much as possible, the test suite should avoid
depending on external programs or libraries.

Of course, there is a nasty bootstrap problem
with a test suite implemented in a
scripting language. A wide variety
of systems provide no support for modern
scripting languages. We will avoid
this issue for now and assume that
the scripting language of choice is
supported by the system.


For example, the test suite should not depend
on CVS to generate test results. Many users
will not have access to CVS on the system
they want to test subversion on.

<hr>

</body>
</html>
Received on Sat Oct 21 14:36:26 2006

This message: [ Message body ]
Next message: Mo DeJong: "Libtool error in current CVS."
Previous message: Ben Collins-Sussman: "Re: CVS update: subversion/subversion/tests/libsvn_ra_local Makefile.am"
In reply to: Greg Stein: "Re: CVS update: subversion/subversion/tests/clients/cmdline README common.sh"
Next in thread: Greg Stein: "Re: SVN test suite requirements (was CVS update ...)"
Reply: Greg Stein: "Re: SVN test suite requirements (was CVS update ...)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]