Get SWEBOK V3.0

» PDF (free)
» Book
Guide to the Software Engineering Body of Knowledge (SWEBOK)
 
SWEEBOK All For One

CHAPTER 5

SOFTWARE TESTING

 

 

ACRONYM

 

SRET: Software Reliability Engineered Testing

 

INTRODUCTION

 

Testing is an activity performed for evaluating product quality, and for improving it, by identifying defects and problems.

Software testing consists of the dynamic verification of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior.

In the above definition, italicized words correspond to key issues in identifying the Knowledge Area of Software Testing. In particular:

  • Dynamic: This term means that testing always implies executing the program on (valued) inputs. To be precise, the input value alone is not always sufficient to determine a test, since a complex, nondeterministic system might react to the same input with different  behaviors, depending on the system state. In this KA, though, the term “input” will be maintained, with the implied convention that its meaning also includes a specified input state, in those cases in which it is needed. Different from testing and complementary to it are static techniques, as described in the Software Quality KA.

  • Finite: Even in simple programs, so many test cases are theoretically possible that exhaustive testing could require months or years to execute. This is why in practice the whole test set can generally be considered infinite. Testing always implies a trade-off between limited resources and schedules on the one hand and inherently unlimited test requirements on the other.

  • Selected: The many proposed test techniques differ essentially in how they select the test set, and software engineers must be aware that different selection criteria may yield vastly different degrees of effectiveness. How to identify the most suitable selection criterion under given conditions is a very complex problem; in practice, risk analysis techniques and test engineering expertise are applied.

  • Expected: It must be possible, although not always easy, to decide whether the observed outcomes of program execution are acceptable or not, otherwise the testing effort would be useless. The observed behavior may be checked against user expectations (commonly referred to as testing for validation), against a specification (testing for verification), or, finally, against the anticipated behavior from implicit requirements or reasonable expectations. See, in the Software Requirements KA, topic 6.4 Acceptance Tests.

The view of software testing has evolved towards a more constructive one. Testing is no longer seen as an activity which starts only after the coding phase is complete, with the limited purpose of detecting failures. Software testing is now seen as an activity which should encompass the whole development and maintenance process and is itself an important part of the actual product construction. Indeed, planning for testing should start with the early stages of the requirement process, and test plans and procedures must be systematically and continuously developed, and possibly refined, as development proceeds. These test planning and designing activities themselves constitute useful input for designers in highlighting potential weaknesses (like design oversights or contradictions, and omissions or ambiguities in the documentation).

It is currently considered that the right attitude towards quality is one of prevention: it is obviously much better to avoid problems than to correct them. Testing must be seen, then, primarily as a means for checking not only whether the prevention has been effective, but also for identifying faults in those cases where, for some reason, it has not been effective. It is perhaps obvious but worth recognizing that, even after successful completion of an extensive testing campaign, the software could still contain faults. The remedy for software failures experienced after delivery is provided by corrective maintenance actions. Software maintenance topics are covered in the Software Maintenance KA.

In the Software Quality KA (See topic 3.3 Software Quality Management Techniques), software quality management techniques are notably categorized into static techniques (no code execution) and dynamic techniques (code execution). Both categories are useful. This KA focuses on dynamic techniques.

Software testing is also related to software construction (see topic 3.4 Construction Testing in that KA). Unit and integration testing are intimately related to software construction, if not part of it.

 

BREAKDOWN OF TOPICS

 

The breakdown of topics for the Software Testing KA is shown in Figure 1.

The first subarea describes Software Testing Fundamentals. It covers the basic definitions in the field of software testing, the basic terminology and key issues, and its relationship with other activities.

The second subarea, Test Levels, consists of two (orthogonal) topics: 2.1 lists the levels in which the testing of large software is traditionally subdivided; and 2.2 considers testing for specific conditions or properties and is referred to as objectives of testing. Not all types of testing apply to every software product, nor has every possible type been listed.

The test target and test objective together determine how the test set is identified, both with regard to its consistency—how much testing is enough for achieving the stated objective—and its composition—which test cases should be selected for achieving the stated objective (although usually the “for achieving the stated objective” part is left implicit and only the first part of the two italicized questions above is posed). Criteria for addressing the first question are referred to as test adequacy criteria, while those addressing the second question are the test selection criteria.

Several Test Techniques have been developed in the past few decades, and new ones are still being proposed. Generally accepted techniques are covered in subarea 3.

Test-related Measures are dealt with in subarea 4.

Finally, issues relative to Test Process are covered in subarea 5.

 

1. Software Testing Fundamentals

 

1.1. Testing-related terminology

1.1.1. Definitions of testing and related terminology [Bei90:c1; Jor02:c2; Lyu96:c2s2.2] (IEEE610.12-90)

A comprehensive introduction to the Software Testing KA is provided in the recommended references.

1.1.2. Faults vs. Failures [Jor02:c2; Lyu96:c2s2.2; Per95:c1; Pfl01:c8] (IEEE610.12-90; IEEE982.1-88)

Many terms are used in the software engineering literature to describe a malfunction, notably fault, failure, error, and several others. This terminology is precisely defined in IEEE Standard 610.12-1990, Standard Glossary of Software Engineering Terminology (IEEE610-90), and is also discussed in the Software Quality KA. It is essential to clearly distinguish between the cause of a malfunction, for which the term fault or defect will be used here, and an undesired effect observed in the system’s delivered service, which will be called a failure. Testing can reveal failures, but it is the faults that can and must be removed.

However, it should be recognized that the cause of a failure cannot always be unequivocally identified. No theoretical criteria exist to definitively determine what fault caused the observed failure. It might be said that it was the fault that had to be modified to remove the problem, but other modifications could have worked just as well. To avoid ambiguity, some authors prefer to speak of failure-causing inputs (Fra98) instead of faults—that is, those sets of inputs that cause a failure to appear.

1.2. Key issues

1.2.1. Test selection criteria/Test adequacy criteria (or stopping rules) [Pfl01:c8s7.3; Zhu97:s1.1] (Wey83; Wey91; Zhu97)

A test selection criterion is a means of deciding what a suitable set of test cases should be. A selection criterion can be used for selecting the test cases or for checking whether a selected test suite is adequate—that is, to decide whether the testing can be stopped. See also the sub-topic Termination, under topic 5.1 Practical considerations.

1.2.2. Testing effectiveness/Objectives for testing [Bei90:c1s1.4; Per95:c21] (Fra98)

Testing is the observation of a sample of program executions. Sample selection can be guided by different objectives: it is only in light of the objective pursued that the effectiveness of the test set can be evaluated.

1.2.3. Testing for defect identification [Bei90:c1; Kan99:c1]

In testing for defect identification, a successful test is one which causes the system to fail. This is quite different from testing to demonstrate that the software meets its specifications or other desired properties, in which case testing is successful if no (significant) failures are observed.

1.2.4. The oracle problem [Bei90:c1] (Ber96, Wey83)

An oracle is any (human or mechanical) agent which decides whether a program behaved correctly in a given test, and accordingly produces a verdict of “pass” or “fail.” There exist many different kinds of oracles, and oracle automation can be very difficult and expensive.

1.2.5. Theoretical and practical limitations of testing [Kan99:c2] (How76)

Testing theory warns against ascribing an unjustified level of confidence to a series of passed tests. Unfortunately, most established results of testing theory are negative ones, in that they state what testing can never achieve as opposed to what it actually achieved. The most famous quotation in this regard is the Dijkstra aphorism that “program testing can be used to show the presence of bugs, but never to show their absence.” The obvious reason is that complete testing is not feasible in real software. Because of this, testing must be driven based on risk and can be seen as a risk management strategy.

1.2.6. The problem of infeasible paths [Bei90:c3]

Infeasible paths, the control flow paths that cannot be exercised by any input data, are a significant problem in path-oriented testing, and particularly in the automated derivation of test inputs for code-based testing techniques.

1.2.7. Testability [Bei90:c3, c13] (Bac90; Ber96a; Voa95)

The term “software testability” has two related but different meanings: on the one hand, it refers to the degree to which it is easy for software to fulfill a given test coverage criterion, as in (Bac90); on the other hand, it is defined as the likelihood, possibly measured statistically, that the software will expose a failure under testing, if it is faulty, as in (Voa95, Ber96a). Both meanings are important.

1.3. Relationships of testing to other activities

Software testing is related to but different from static software quality management techniques, proofs of correctness, debugging, and programming. However, it is informative to consider testing from the point of view of software quality analysts and of certifiers.

  • Testing vs. Static Software Quality Management techniques. See also the Software Quality KA, subarea 2. Software Quality Management Processes. [Bei90:c1; Per95:c17] (IEEE1008-87)

  • Testing vs. Correctness Proofs and Formal Verification [Bei90:c1s5; Pfl01:c8].

  • Testing vs. Debugging. See also the Software Construction KA, topic 3.4 Construction testing [Bei90:c1s2.1] (IEEE1008-87).

  • Testing vs. Programming. See also the Software Construction KA, topic 3.4 Construction testing [Bei90:c1s2.3].

  • Testing and Certification (Wak99).

 

2. Test Levels

 

2.1. The target of the test

Software testing is usually performed at different levels along the development and maintenance processes. That is to say, the target of the test can vary: a single module, a group of such modules (related by purpose, use, behavior, or structure), or a whole system. [Bei90:c1; Jor02:c13; Pfl01:c8] Three big test stages can be conceptually distinguished, namely Unit, Integration, and System. No process model is implied, nor are any of those three stages assumed to have greater importance than the other two.

2.1.1. Unit testing [Bei90:c1; Per95:c17; Pfl01:c8s7.3] (IEEE1008-87)

Unit testing verifies the functioning in isolation of software pieces which are separately testable. Depending on the context, these could be the individual subprograms or a larger component made of tightly related units. A test unit is defined more precisely in the IEEE Standard for Software Unit Testing (IEEE1008-87), which also describes an integrated approach to systematic and documented unit testing. Typically, unit testing occurs with access to the code being tested and with the support of debugging tools, and might involve the programmers who wrote the code.

2.1.2. Integration testing [Jor02:c13, 14; Pfl01:c8s7.4]

Integration testing is the process of verifying the interaction between software components. Classical integration testing strategies, such as top-down or bottom-up, are used with traditional, hierarchically structured software.

Modern systematic integration strategies are rather architecture-driven, which implies integrating the software components or subsystems based on identified functional threads. Integration testing is a continuous activity, at each stage of which software engineers must abstract away lower-level perspectives and concentrate on the perspectives of the level they are integrating. Except for small, simple software, systematic, incremental integration testing strategies are usually preferred to putting all the components together at once, which is pictorially called “big bang” testing.

2.1.3. System testing [Jor02:c15; Pfl01:c9]

System testing is concerned with the behavior of a whole system. The majority of functional failures should already have been identified during unit and integration testing. System testing is usually considered appropriate for comparing the system to the non-functional system requirements, such as security, speed, accuracy, and reliability. External interfaces to other applications, utilities, hardware devices, or the operating environment are also evaluated at this level. See the Software Requirements KA for more information on functional and non-functional requirements.

2.2. Objectives of Testing [Per95:c8; Pfl01:c9s8.3]

Testing is conducted in view of a specific objective, which is stated more or less explicitly, and with varying degrees of precision. Stating the objective in precise, quantitative terms allows control to be established over the test process.

Testing can be aimed at verifying different properties. Test cases can be designed to check that the functional specifications are correctly implemented, which is variously referred to in the literature as conformance testing, correctness testing, or functional testing. However, several other nonfunctional properties may be tested as well, including performance, reliability, and usability, among many others.

Other important objectives for testing include (but are not limited to) reliability measurement, usability evaluation, and acceptance, for which different approaches would be taken. Note that the test objective varies with the test target; in general, different purposes being addressed at a different level of testing.

References recommended above for this topic describe the set of potential test objectives. The sub-topics listed below are those most often cited in the literature. Note that some kinds of testing are more appropriate for custom-made software packages, installation testing, for example; and others for generic products, like beta testing.

2.2.1. Acceptance/qualification testing [Per95:c10; Pfl01:c9s8.5] (IEEE12207.0-96:s5.3.9)

Acceptance testing checks the system behavior against the customer’s requirements, however these may have been expressed; the customers undertake, or specify, typical tasks to check that their requirements have been met or that the organization has identified these for the target market for the software. This testing activity may or may not involve the developers of the system.

2.2.2. Installation testing [Per95:c9; Pfl01:c9s8.6]

Usually after completion of software and acceptance testing, the software can be verified upon installation in the target environment. Installation testing can be viewed as system testing conducted once again according to hardware configuration requirements. Installation procedures may also be verified.

2.2.3. Alpha and beta testing [Kan99:c13]

Before the software is released, it is sometimes given to a small, representative set of potential users for trial use, either in-house (alpha testing) or external (beta testing). These users report problems with the product. Alpha and beta use is often uncontrolled, and is not always referred to in a test plan.

2.2.4. Conformance testing/Functional testing/Correctness testing [Kan99:c7; Per95:c8] (Wak99)

Conformance testing is aimed at validating whether or not the observed behavior of the tested software conforms to its specifications.

2.2.5. Reliability achievement and evaluation [Lyu96:c7; Pfl01:c9s.8.4] (Pos96)

In helping to identify faults, testing is a means to improve reliability. By contrast, by randomly generating test cases according to the operational profile, statistical measures of reliability can be derived. Using reliability growth models, both objectives can be pursued together (see also sub-topic 4.1.4 Life test, reliability evaluation).

2.2.6. Regression testing [Kan99:c7; Per95:c11, c12; Pfl01:c9s8.1] (Rot96)

According to (IEEE610.12-90), regression testing is the “selective retesting of a system or component to verify that modifications have not caused unintended effects...” In practice, the idea is to show that software which previously passed the tests still does. Beizer (Bei90) defines it as any repetition of tests intended to show that the software’s behavior is unchanged, except insofar as required. Obviously a trade-off must be made between the assurance given by regression testing every time a change is made and the resources required to do that.

Regression testing can be conducted at each of the test levels described in topic 2.1 The target of the test and may apply to functional and nonfunctional testing.

2.2.7. Performance testing [Per95:c17; Pfl01:c9s8.3] (Wak99)

This is specifically aimed at verifying that the software meets the specified performance requirements, for instance, capacity and response time. A specific kind of performance testing is volume testing (Per95:p185, p487; Pfl01:p401), in which internal program or system limitations are tried.

2.2.8. Stress testing [Per95:c17; Pfl01:c9s8.3]

Stress testing exercises software at the maximum design load, as well as beyond it.

2.2.9. Back-to-back testing

A single test set is performed on two implemented versions of a software product, and the results are compared.

2.2.10. Recovery testing [Per95:c17; Pfl01:c9s8.3]

Recovery testing is aimed at verifying software restart capabilities after a “disaster.”

2.2.11. Configuration testing [Kan99:c8; Pfl01:c9s8.3]

In cases where software is built to serve different users, configuration testing analyzes the software under the various specified configurations.

2.2.12. Usability testing [Per95:c8; Pfl01:c9s8.3]

This process evaluates how easy it is for end-users to use and learn the software, including user documentation; how effectively the software functions in supporting user tasks; and, finally, its ability to recover from user errors.

2.2.13.Test-driven development [Bec02]

Test-driven development is not a test technique per se, promoting the use of tests as a surrogate for a requirements specification document rather than as an independent check that the software has correctly implemented the requirements.

 

3. Test Techniques

 

One of the aims of testing is to reveal as much potential for failure as possible, and many techniques have been developed to do this, which attempt to “break” the program, by running one or more tests drawn from identified classes of executions deemed equivalent. The leading principle underlying such techniques is to be as systematic as possible in identifying a representative set of program behaviors; for instance, considering subclasses of the input domain, scenarios, states, and dataflow.

It is difficult to find a homogeneous basis for classifying all techniques, and the one used here must be seen as a compromise. The classification is based on how tests are generated from the software engineer’s intuition and experience, the specifications, the code structure, the (real or artificial) faults to be discovered, the field usage, or, finally, the nature of the application. Sometimes these techniques are classified as white-box, also called glassbox, if the tests rely on information about how the software has been designed or coded, or as black-box if the test cases rely only on the input/output behavior. One last category deals with combined use of two or more techniques. Obviously, these techniques are not used equally often by all practitioners. Included in the list are those that a software engineer should know.

3.1. Based on the software engineer’s intuition and experience

3.1.1. Ad hoc testing [Kan99:c1]

Perhaps the most widely practiced technique remains ad hoc testing: tests are derived relying on the software engineer’s skill, intuition, and experience with similar programs. Ad hoc testing might be useful for identifying special tests, those not easily captured by formalized techniques.

3.1.2. Exploratory testing

Exploratory testing is defined as simultaneous learning, test design, and test execution; that is, the tests are not defined in advance in an established test plan, but are dynamically designed, executed, and modified. The effectiveness of exploratory testing relies on the software engineer’s knowledge, which can be derived from various sources: observed product behavior during testing, familiarity with the application, the platform, the failure process, the type of possible faults and failures, the risk associated with a particular product, and so on. [Kan01:c3]

3.2. Specification-based techniques

3.2.1. Equivalence partitioning [Jor02:c7; Kan99:c7]

The input domain is subdivided into a collection of subsets, or equivalent classes, which are deemed equivalent according to a specified relation, and a representative set of tests (sometimes only one) is taken from each class.

3.2.2. Boundary-value analysis [Jor02:c6; Kan99:c7]

Test cases are chosen on and near the boundaries of the input domain of variables, with the underlying rationale that many faults tend to concentrate near the extreme values of inputs. An extension of this technique is robustness testing, wherein test cases are also chosen outside the input domain of variables, to test program robustness to unexpected or erroneous inputs.

3.2.3. Decision table [Bei90:c10s3] (Jor02)

Decision tables represent logical relationships between conditions (roughly, inputs) and actions (roughly, outputs). Test cases are systematically derived by considering every possible combination of conditions and actions. A related technique is cause-effect graphing. [Pfl01:c9]

3.2.4. Finite-state machine-based [Bei90:c11; Jor02:c8]

By modeling a program as a finite state machine, tests can be selected in order to cover states and transitions on it.

3.2.5. Testing from formal specifications [Zhu97:s2.2] (Ber91; Dic93; Hor95)

Giving the specifications in a formal language allows for automatic derivation of functional test cases, and, at the same time, provides a reference output, an oracle, for checking test results. Methods exist for deriving test cases from model-based (Dic93, Hor95) or algebraic specifications. (Ber91)

3.2.6. Random testing [Bei90:c13; Kan99:c7]

Tests are generated purely at random, not to be confused with statistical testing from the operational profile as described in sub-topic 3.5.1 Operational profile. This form of testing falls under the heading of the specification-based entry, since at least the input domain must be known, to be able to pick random points within it.

3.3. Code-based techniques

3.3.1. Control-flow-based criteria [Bei90:c3; Jor02:c10] (Zhu97)

Control-flow-based coverage criteria is aimed at covering all the statements or blocks of statements in a program, or specified combinations of them. Several coverage criteria have been proposed, like condition/decision coverage. The strongest of the control-flow-based criteria is path testing, which aims to execute all entry-to-exit control flow paths in the flowgraph. Since path testing is generally not feasible because of loops, other less stringent criteria tend to be used in practice, such as statement testing, branch testing, and condition/decision testing. The adequacy of such tests is measured in percentages; for example, when all branches have been executed at least once by the tests, 100% branch coverage is said to have been achieved.

3.3.2. Data flow-based criteria [Bei90:c5] (Jor02; Zhu97)

In data-flow-based testing, the control flowgraph is annotated with information about how the program variables are defined, used, and killed (undefined). The strongest criterion, all definition-use paths, requires that, for each variable, every control flow path segment from a definition of that variable to a use of that definition is executed. In order to reduce the number of paths required, weaker strategies such as all-definitions and all-uses are employed.

3.3.3. Reference models for code-based testing (flowgraph, call graph) [Bei90:c3; Jor02:c5].

Although not a technique in itself, the control structure of a program is graphically represented using a flowgraph in code-based testing techniques. A flowgraph is a directed graph the nodes and arcs of which correspond to program elements. For instance, nodes may represent statements or uninterrupted sequences of statements, and arcs the transfer of control between nodes.

3.4. Fault-based techniques (Mor90)

With different degrees of formalization, fault-based testing techniques devise test cases specifically aimed at revealing categories of likely or predefined faults.

3.4.1. Error guessing [Kan99:c7]

In error guessing, test cases are specifically designed by software engineers trying to figure out the most plausible faults in a given program. A good source of information is the history of faults discovered in earlier projects, as well as the software engineer’s expertise.

3.4.2. Mutation testing [Per95:c17; Zhu97:s3.2-s3.3]

A mutant is a slightly modified version of the program under test, differing from it by a small, syntactic change. Every test case exercises both the original and all generated mutants: if a test case is successful in identifying the difference between the program and a mutant, the latter is said to be “killed.” Originally conceived as a technique to evaluate a test set (see 4.2), mutation testing is also a testing criterion in itself: either tests are randomly generated until enough mutants have been killed, or tests are specifically designed to kill surviving mutants. In the latter case, mutation testing can also be categorized as a code-based technique. The underlying assumption of mutation testing, the coupling effect, is that by looking for simple syntactic faults, more complex but real faults will be found. For the technique to be effective, a large number of mutants must be automatically derived in a systematic way.

3.5. Usage-based techniques

3.5.1. Operational profile [Jor02:c15; Lyu96:c5; Pfl01:c9]

In testing for reliability evaluation, the test environment must reproduce the operational environment of the software as closely as possible. The idea is to infer, from the observed test results, the future reliability of the software when in actual use. To do this, inputs are assigned a probability distribution, or profile, according to their occurrence in actual operation.

3.5.2. Software Reliability Engineered Testing [Lyu96:c6]

Software Reliability Engineered Testing (SRET) is a testing method encompassing the whole development process, whereby testing is “designed and guided by reliability objectives and expected relative usage and criticality of different functions in the field.”

3.6. Techniques based on the nature of the application

The above techniques apply to all types of software. However, for some kinds of applications, some additional know-how is required for test derivation. A list of a few specialized testing fields is provided here, based on the nature of the application under test:

  • Object-oriented testing [Jor02:c17; Pfl01:c8s7.5] (Bin00)

  • Component-based testing

  • Web-based testing

  • GUI testing [Jor20]

  • Testing of concurrent programs (Car91)

  • Protocol conformance testing (Pos96; Boc94)

  • Testing of real-time systems (Sch94)

  • Testing of safety-critical systems (IEEE1228-94)

3.7. Selecting and combining techniques

3.7.1. Functional and structural [Bei90:c1s.2.2; Jor02:c2, c9, c12; Per95:c17] (Pos96)

Specification-based and code-based test techniques are often contrasted as functional vs. structural testing. These two approaches to test selection are not to be seen as alternative but rather as complementary; in fact, they use different sources of information and have proved to highlight different kinds of problems. They could be used in combination, depending on budgetary considerations.

3.7.2. Deterministic vs. random (Ham92; Lyu96:p541-547)

Test cases can be selected in a deterministic way, according to one of the various techniques listed, or randomly drawn from some distribution of inputs, such as is usually done in reliability testing. Several analytical and empirical comparisons have been conducted to analyze the conditions that make one approach more effective than the other.

 

4. Test-related measures

 

Sometimes, test techniques are confused with test objectives. Test techniques are to be viewed as aids which help to ensure the achievement of test objectives. For instance, branch coverage is a popular test technique. Achieving a specified branch coverage measure should not be considered the objective of testing per se: it is a means to improve the chances of finding failures by systematically exercising every program branch out of a decision point. To avoid such misunderstandings, a clear distinction should be made between test-related measures, which provide an evaluation of the program under test based on the observed test outputs, and those which evaluate the thoroughness of the test set. Additional information on measurement programs is provided in the Software Engineering Management KA, subarea 6, Software engineering measurement. Additional information on measures can be found in the Software Engineering Process KA, subarea 4, Process and product measurement.

Measurement is usually considered instrumental to quality analysis. Measurement may also be used to optimize the planning and execution of the tests. Test management can use several process measures to monitor progress. Measures relative to the test process for management purposes are considered in topic 5.1 Practical considerations.

4.1. Evaluation of the program under test (IEEE982.1-98)

4.1.1. Program measurements to aid in planning and designing testing [Bei90:c7s4.2; Jor02:c9] (Ber96; IEEE982.1-88)

Measures based on program size (for example, source lines of code or function points) or on program structure (like complexity) are used to guide testing. Structural measures can also include measurements among program modules in terms of the frequency with which modules call each other.

4.1.2. Fault types, classification, and statistics [Bei90:c2; Jor02:c2; Pfl01:c8] (Bei90; IEEE1044-93; Kan99; Lyu96)

The testing literature is rich in classifications and taxonomies of faults. To make testing more effective, it is important to know which types of faults could be found in the software under test, and the relative frequency with which these faults have occurred in the past. This information can be very useful in making quality predictions, as well as for process improvement. More information can be found in the Software Quality KA, topic 3.2 Defect characterization. An IEEE standard exists on how to classify software “anomalies” (IEEE1044-93).

4.1.3. Fault density [Per95:c20] (IEEE982.1-88; Lyu96:c9)

A program under test can be assessed by counting and classifying the discovered faults by their types. For each fault class, fault density is measured as the ratio between the number of faults found and the size of the program

4.1.4. Life test, reliability evaluation [Pfl01:c9] (Pos96:p146-154)

A statistical estimate of software reliability, which can be obtained by reliability achievement and evaluation (see sub-topic 2.2.5), can be used to evaluate a product and decide whether or not testing can be stopped.

4.1.5. Reliability growth models [Lyu96:c7; Pfl01:c9] (Lyu96:c3, c4)

Reliability growth models provide a prediction of reliability  based on the failures observed under reliability achievement and evaluation (see sub-topic 2.2.5). They assume, in general, that the faults that caused the observed failures have been fixed (although some models also accept imperfect fixes), and thus, on average, the product’s reliability exhibits an increasing trend. There now exist dozens of published models. Many are laid down on some common assumptions, while others differ. Notably, these models are divided into failure-count and time-between-failure models.

4.2. Evaluation of the tests performed

4.2.1. Coverage/thoroughness measures [Jor02:c9; Pfl01:c8] (IEEE982.1-88)

Several test adequacy criteria require that the test cases systematically exercise a set of elements identified in the program or in the specifications (see subarea 3). To evaluate the thoroughness of the executed tests, testers can monitor the elements covered, so that they can dynamically measure the ratio between covered elements and their total number. For example, it is possible to measure the percentage of covered branches in the program flowgraph, or that of the functional requirements exercised among those listed in the specifications document. Code-based adequacy criteria require appropriate instrumentation of the program under test.

4.2.2. Fault seeding [Pfl01:c8] (Zhu97:s3.1)

Some faults are artificially introduced into the program before test. When the tests are executed, some of these eeded faults will be revealed, and possibly some faults which were already there will be as well. In theory, depending on which of the artificial faults are discovered, and how many, testing effectiveness can be evaluated, and the remaining number of genuine faults can be estimated. In practice, statisticians question the distribution and representativeness of seeded faults relative to genuine faults and the small sample size on which any extrapolations are based. Some also argue that this technique should be used with great care, since inserting faults into software involves the obvious risk of leaving them there.

4.2.3. Mutation score [Zhu97:s3.2-s3.3]

In mutation testing (see sub-topic 3.4.2), the ratio of killed mutants to the total number of generated mutants can be a measure of the effectiveness of the executed test set.

4.2.4. Comparison and relative effectiveness of different techniques [Jor02:c9, c12; Per95:c17; Zhu97:s5] (Fra93; Fra98; Pos96: p64-72)

Several studies have been conducted to compare the relative effectiveness of different test techniques. It is important to be precise as to the property against which the techniques are being assessed; what, for instance, is the exact meaning given to the term “effectiveness”? Possible interpretations are: the number of tests needed to find the first failure, the ratio of the number of faults found through testing to all the faults found during and after testing, or how much reliability was improved. Analytical and empirical comparisons between different techniques have been conducted according to each of the notions of effectiveness specified above.

 

5. Test Process

 

Testing concepts, strategies, techniques, and measures need to be integrated into a defined and controlled process which is run by people. The test process supports testing activities and provides guidance to testing teams, from test planning to test output evaluation, in such a way as to provide justified assurance that the test objectives will be met cost-effectively.

5.1. Practical considerations

5.1.1. Attitudes/Egoless programming[Bei90:c13s3.2; Pfl01:c8]

A very important component of successful testing is a collaborative attitude towards testing and quality assurance activities. Managers have a key role in fostering a generally favorable reception towards failure discovery during development and maintenance; for instance, by preventing a mindset of code ownership among programmers, so that they will not feel responsible for failures revealed by their code.

5.1.2. Test guides [Kan01]

The testing phases could be guided by various aims, for example: in risk-based testing, which uses the product risks to prioritize and focus the test strategy; or in scenario-based testing, in which test cases are defined based on specified software scenarios.

5.1.3. Test process management [Bec02: III; Per95:c1-c4; Pfl01:c9] (IEEE1074-97; IEEE12207.0-96:s5.3.9, s5.4.2, s6.4, s6.5)

Test activities conducted at different levels (see subarea 2. Test levels) must be organized, together with people, tools, policies, and measurements, into a well-defined process which is an integral part of the life cycle. In IEEE/EIA Standard 12207.0, testing is not described as a stand-alone process, but principles for testing activities are included along with both the five primary life cycle processes and the supporting process. In IEEE Std 1074, testing is grouped with other evaluation activities as integral to the entire life cycle.

5.1.4. Test documentation and work products [Bei90:c13s5; Kan99:c12; Per95:c19; Pfl01:c9s8.8] (IEEE829-98)

Documentation is an integral part of the formalization of the test process. The IEEE Standard for Software Test Documentation (IEEE829-98) provides a good description of test documents and of their relationship with one another and with the testing process. Test documents may include, among others, Test Plan, Test Design Specification, Test Procedure Specification, Test Case Specification, Test Log, and Test Incident or Problem Report. The software under test is documented as the Test Item. Test documentation should be produced and continually updated, to the same level of quality as other types of documentation in software engineering.

5.1.5. Internal vs. independent test team [Bei90:c13s2.2-c13s2.3; Kan99:c15; Per95:c4; Pfl01:c9]

Formalization of the test process may involve formalizing the test team organization as well. The test team can be composed of internal members (that is, on the project team, involved or not in software construction), of external members, in the hope of bringing in an unbiased, independent perspective, or, finally, of both internal and external members. Considerations of costs, schedule, maturity levels of the involved organizations, and criticality of the application may determine the decision.

5.1.6. Cost/effort estimation and other process measures [Per95:c4, c21] (Per95: Appendix B; Pos96:p139-145; IEEE982.1-88)

Several measures related to the resources spent on testing, as well as to the relative fault-finding effectiveness of the various test phases, are used by managers to control and improve the test process. These test measures may cover such aspects as number of test cases specified, number of test cases executed, number of test cases passed, and number of test cases failed, among others.

Evaluation of test phase reports can be combined with root-cause analysis to evaluate test process effectiveness in finding faults as early as possible. Such an evaluation could be associated with the analysis of risks. Moreover, the resources that are worth spending on testing should be commensurate with the use/criticality of the application: different techniques have different costs and yield different levels of confidence in product reliability.

5.1.7. Termination [Bei90:c2s2.4; Per95:c2]

A decision must be made as to how much testing is enough and when a test stage can be terminated. Thoroughness measures, such as achieved code coverage or functional completeness, as well as estimates of fault density or of operational reliability, provide useful support, but are not sufficient in themselves. The decision also involves considerations about the costs and risks incurred by the potential for remaining failures, as opposed to the costs implied by continuing to test. See also sub-topic 1.2.1 Test selection criteria/Test adequacy criteria.

5.1.8. Test reuse and test patterns [Bei90:c13s5]

To carry out testing or maintenance in an organized and cost-effective way, the means used to test each part of the software should be reused systematically. This repository of test materials must be under the control of software configuration management, so that changes to software requirements or design can be reflected in changes to the scope of the tests conducted.

The test solutions adopted for testing some application types under certain circumstances, with the motivations behind the decisions taken, form a test pattern which can itself be documented for later reuse in similar projects.

5.2. Test Activities

Under this topic, a brief overview of test activities is given; as often implied by the following description, successful management of test activities strongly depends on the Software Configuration Management process.

5.2.1. Planning [Kan99:c12; Per95:c19; Pfl01:c8s7.6] (IEEE829-98:s4; IEEE1008-87:s1-s3)

Like any other aspect of project management, testing activities must be planned. Key aspects of test planning include coordination of personnel, management of available test facilities and equipment (which may include magnetic media, test plans and procedures), and planning for possible undesirable outcomes. If more than one baseline of the software is being maintained, then a major planning consideration is the time and effort needed to ensure that the test environment is set to the proper configuration.

5.2.2. Test-case generation [Kan99:c7] (Pos96:c2; IEEE1008-87:s4, s5)

Generation of test cases is based on the level of testing to be performed and the particular testing techniques. Test cases should be under the control of software configuration management and include the expected results for each test.

5.2.3. Test environment development [Kan99:c11]

The environment used for testing should be compatible with the software engineering tools. It should facilitate development and control of test cases, as well as logging and recovery of expected results, scripts, and other testing materials.

5.2.4. Execution [Bei90:c13; Kan99:c11] (IEEE1008-87:s6, s7)

Execution of tests should embody a basic principle of scientific experimentation: everything done during testing should be performed and documented clearly enough that another person could replicate the results. Hence, testing should be performed in accordance with documented procedures using a clearly defined version of the software under test.

5.2.5. Test results evaluation [Per95:c20,c21] (Pos96:p18-20, p131-138)

The results of testing must be evaluated to determine whether or not the test has been successful. In most cases, “successful” means that the software performed as expected and did not have any major unexpected outcomes. Not all unexpected outcomes are necessarily faults, however, but could be judged to be simply noise. Before a failure can be removed, an analysis and debugging effort is needed to isolate, identify, and describe it. When test results are particularly important, a formal review board may be convened to evaluate them.

5.2.6. Problem reporting/Test log [Kan99:c5; Per95:c20] (IEEE829-98:s9-s10) [Kan99:c6]

Testing activities can be entered into a test log to identify when a test was conducted, who performed the test, what software configuration was the basis for testing, and other relevant identification information. Unexpected or incorrect test results can be recorded in a problem-reporting system, the data of which form the basis for later debugging and for fixing the problems that were observed as failures during testing. Also, anomalies not classified as faults could be documented in case they later turn out to be more serious than first thought. Test reports are also an input to the change management request process (see the Software Configuration Management KA, subarea 3, Software configuration control).

5.2.7. Defect tracking

Failures observed during testing are most often due to faults or defects in the software. Such defects can be analyzed to determine when they were introduced into the software, what kind of error caused them to be created (poorly defined requirements, incorrect variable declaration, memory leak, programming syntax error, for example), and when they could have been first observed in the software. Defect-tracking information is used to determine what aspects of software engineering need improvement and how effective previous analyses and testing have been.

 

RECOMMENDED REFERENCES FOR SOFTWARE TESTING

  • [Bec02] K. Beck, Test-Driven Development by Example, Addison-Wesley, 2002.

  • [Bei90] B. Beizer, Software Testing Techniques, International Thomson Press, 1990, Chap. 1-3, 5, 7s4, 10s3, 11, 13.

  • [Jor02] P. C. Jorgensen, Software Testing: A Craftsman's Approach, second edition, CRC Press, 2004, Chap. 2, 5- 10, 12-15, 17, 20.

  • [Kan99] C. Kaner, J. Falk, and H.Q. Nguyen, Testing Computer Software, second ed., John Wiley & Sons, 1999, Chaps. 1, 2, 5-8, 11-13, 15.

  • [Kan01] C. Kaner, J. Bach, and B. Pettichord, Lessons Learned in Software Testing, Wiley Computer Publishing, 2001.

  • [Lyu96] M.R. Lyu, Handbook of Software Reliability Engineering, Mc-Graw-Hill/IEEE, 1996, Chap. 2s2.2, 5-7.

  • [Per95] W. Perry, Effective Methods for Software Testing, John Wiley & Sons, 1995, Chap. 1-4, 9, 10-12, 17, 19-21.

  • [Pfl01] S. L. Pfleeger, Software Engineering: Theory and Practice, second ed., Prentice Hall, 2001, Chap. 8, 9.

  • [Zhu97] H. Zhu, P.A.V. Hall and J.H.R. May, “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys, vol. 29, iss. 4 (Sections 1, 2.2, 3.2, 3.3), Dec. 1997, pp. 366-427.

APPENDIX A. LIST OF FURTHER READINGS

  • (Bac90) R. Bache and M. Müllerburg, “Measures of Testability as a Basis for Quality Assurance,” Software Engineering Journal, vol. 5, March 1990, pp. 86-92.

  • (Bei90) B. Beizer, Software Testing Techniques, International Thomson Press, second ed., 1990.

  • (Ber91) G. Bernot, M.C. Gaudel and B. Marre, “Software Testing Based On Formal Specifications: a Theory and a Tool,” Software Engineering Journal, Nov. 1991, pp. 387-405.

  • (Ber96) A. Bertolino and M. Marrè, “How Many Paths Are Needed for Branch Testing?” Journal of Systems and Software, vol. 35, iss. 2, 1996, pp. 95-106.

  • (Ber96a) A. Bertolino and L. Strigini, “On the Use of Testability Measures for Dependability Assessment,” IEEE Transactions on Software Engineering, vol. 22, iss. 2, Feb. 1996, pp. 97-108.

  • (Bin00) R.V. Binder, Testing Object-Oriented Systems Models, Patterns, and Tools, Addison-Wesley, 2000.

  • (Boc94) G.V. Bochmann and A. Petrenko, “Protocol Testing: Review of Methods and Relevance for Software Testing,” presented at ACM Proc. Int’l Symp. on Software Testing and Analysis (ISSTA ’94), Seattle, Wash., 1994.

  • (Car91) R.H. Carver and K.C. Tai, “Replay and Testing for Concurrent Programs,” IEEE Software, March 1991, pp. 66-74.

  • (Dic93) J. Dick and A. Faivre, “Automating the Generation and Sequencing of Test Cases from Model-Based Specifications,” presented at FME ’93: Industrial- Strength Formal Methods, LNCS 670, Springer-Verlag, 1993.

  • (Fran93) P. Frankl and E. Weyuker, “A Formal Analysis of the Fault Detecting Ability of Testing Methods,” IEEE Transactions on Software Engineering, vol. 19, iss. 3, March 1993, p. 202.

  • (Fran98) P. Frankl, D. Hamlet, B. Littlewood, and L. Strigini, “Evaluating Testing Methods by Delivered Reliability,” IEEE Transactions on Software Engineering, vol. 24, iss. 8, August 1998, pp. 586-601.

  • (Ham92) D. Hamlet, “Are We Testing for True Reliability?” IEEE Software, July 1992, pp. 21-27.

  • (Hor95) H. Horcher and J. Peleska, “Using Formal Specifications to Support Software Testing,” Software Quality Journal, vol. 4, 1995, pp. 309-327.

  • (How76) W. E. Howden, “Reliability of the Path Analysis Testing Strategy,” IEEE Transactions on Software Engineering, vol. 2, iss. 3, Sept. 1976, pp. 208-215.

  • (Jor02) P.C. Jorgensen, Software Testing: A Craftsman’s Approach, second ed., CRC Press, 2004.

  • (Kan99) C. Kaner, J. Falk, and H.Q. Nguyen, “Testing Computer Software,” second ed., John Wiley & Sons, 1999.

  • (Lyu96) M.R. Lyu, Handbook of Software Reliability Engineering, Mc-Graw-Hill/IEEE, 1996.

  • (Mor90) L.J. Morell, “A Theory of Fault-Based Testing,” IEEE Transactions on Software Engineering, vol. 16, iss. 8, August 1990, pp. 844-857.

  • (Ost88) T.J. Ostrand and M.J. Balcer, “The Category-Partition Method for Specifying and Generating Functional Tests,” Communications of the ACM, vol. 31, iss. 3, June 1988, pp. 676-686.

  • (Ost98) T. Ostrand, A. Anodide, H. Foster, and T. Goradia, “A Visual Test Development Environment for GUI Systems,” presented at ACM Proc. Int’l Symp. on Software Testing and Analysis (ISSTA ’98), Clearwater Beach, Florida, 1998.

  • (Per95) W. Perry, Effective Methods for Software Testing, John Wiley & Sons, 1995.

  • (Pfl01) S.L. Pfleeger, Software Engineering: Theory and Practice, second ed., Prentice-Hall, 2001, Chap. 8, 9.

  • (Pos96) R.M. Poston, Automating Specification-Based Software Testing, IEEE, 1996.

  • (Rot96) G. Rothermel and M.J. Harrold, “Analyzing Regression Test Selection Techniques,” IEEE Transactions on Software Engineering, vol. 22, iss. 8, Aug. 1996, p. 529.

  • (Sch94) W. Schütz, “Fundamental Issues in Testing Distributed Real-Time Systems,” Real-Time Systems Journal, vol. 7, iss. 2, Sept. 1994, pp. 129-157.

  • (Voa95) J.M. Voas and K.W. Miller, “Software Testability: The New Verification,” IEEE Software, May 1995, pp. 17-28.

  • (Wak99) S. Wakid, D.R. Kuhn, and D.R. Wallace, “Toward Credible IT Testing and Certification,” IEEE Software, July-Aug. 1999, pp. 39-47.

  • (Wey82) E.J. Weyuker, “On Testing Non-testable Programs,” The Computer Journal, vol. 25, iss. 4, 1982, pp. 465-470.

  • (Wey83) E.J. Weyuker, “Assessing Test Data Adequacy through Program Inference,” ACM Trans. on Programming Languages and Systems, vol. 5, iss. 4, October 1983, pp. 641-655.

  • (Wey91) E.J. Weyuker, S.N. Weiss, and D. Hamlet, “Comparison of Program Test Strategies,” presented at Proc. Symp. on Testing, Analysis and Verification (TAV4), Victoria, British Columbia, 1991.

  • (Zhu97) H. Zhu, P.A.V. Hall, and J.H.R. May, “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys, vol. 29, iss. 4, Dec. 1997, pp. 366-427.

APPENDIX B. LIST OF STANDARDS

  • (IEEE610.12-90) IEEE Std 610.12-1990 (R2002), IEEE Standard Glossary of Software Engineering Terminology, IEEE, 1990.

  • (IEEE829-98) IEEE Std 829-1998, Standard for Software Test Documentation, IEEE, 1998.

  • (IEEE982.1-88) IEEE Std 982.1-1988, IEEE Standard Dictionary of Measures to Produce Reliable Software, IEEE, 1988.

  • (IEEE1008-87) IEEE Std 1008-1987 (R2003), IEEE Standard for Software Unit Testing, IEEE, 1987.

  • (IEEE1044-93) IEEE Std 1044-1993 (R2002), IEEE Standard for the Classification of Software Anomalies, IEEE, 1993.

  • (IEEE1228-94) IEEE Std 1228-1994, Standard for Software Safety Plans, IEEE, 1994.

  • (IEEE12207.0-96) IEEE/EIA 12207.0-1996 // ISO/IEC12207:1995, Industry Implementation of Int. Std. ISO/IEC 12207:95, Standard for Information Technology-Software Life Cycle Processes, IEEE, 1996.

  •