The issue of how our code should handle internal errors is a complex one
affected by many non-obvious arguments. We need to set a project policy to
ensure that we are providing a consistent and sensible behaviour, and to guide
writers of new code. There are two questions that we need to answer.
1. If a function detects that some state exists that "can't happen", i.e. it
shows that there must be a bug in the code somewhere, what should it do?
2. Should functions routinely include extra code to check for such bugs?
The type of check in question tends to be that a parameter passed to or
returned from an API function has a valid value, i.e. a value that is allowed
by the API's documentation. (This discussion is not concerned with checking
the validity of data read from a file, network, keyboard, etc.)
Typically this bug detection happens passively by reaching the "default" case
of a "switch" statement that handled all valid values, or actively by use of
the "assert" macro or explicit code.
+ "A library should attempt never to crash, and only to return errors to the
Justification: the application may want to save its state before exiting, or to
re-try the operation in a different way in the hope of avoiding the bug. On
the other hand, checking for all possible bugs would be impossible, and even
just checking everywhere for null pointers would be a significant burden, so
this is perhaps an unfeasible goal.
+ "Application code can just crash if it has no significant state to save or
other strong reason to do otherwise, as that is simplest and no less helpful
than exiting with an error message."
Justification: We have lots of cases where we just crash when something
violates the API. For example, we normally don't check for a null pointer,
unless that's explicitly allowed. Adding error messages for such cases that
"cannot happen" just clutters the code. It doesn't really help the user, and
it doesn't help with bug reporting or debugging either, as a reproduction
recipe is still needed. And, as a minor thing, it makes the translators have
to translate really obscure errors.
+ "All possible checks should be enabled in a product like this, even in
production code, to give maximum safety of data."
This argument implies that the code should use "abort" or throw an error, and
not use "assert" which is typically inactive in released code. It assumes that
data is safer if any such bug that still exists in a released version of the
software is found soon as possible, and causes the software to stop what it is
doing. That assumption seems almost self-evident, but of course it is
impossible to insert enough checks to catch all possible bugs. It also assumes
that there is no significant penalty in adding all possible checks. That
assumption is false, since some consistency checks can easily take a long time
to execute, and many simple checks would be repeated with a great deal of
+ "We want the user to be given some idea of what has happened, rather than a
This argument implies that a specific error message should be generated rather
than an "abort". It assumes that most such errors will be detected in the
application software rather than by the hardware or the operating system.
Since uninitialised or null pointers form a significant proportion of this
class of errors, that assumption probably does not hold.
Not much of a consensus is emerging from these arguments so far. One other
thing we could look at is the status quo; it seems that at present we are
mostly exiting with an error message when we detect a bug.
Please review and expand the "arguments" section before we draw any conclusions.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Wed Jun 1 02:42:46 2005