Generating test data sets which are sufficiently large to effectively cover all the tests required before a software component can be certified as reliable is a time consuming and error-prone task if carried out manually. A key parameter when testing collections is the size of the collection to be tested: an automatic test generator builds a set of collections containing n elements where n ranges from 0 to n
crit. Data coverage analysis allows us to determine rigorously a collection size such that testing with collections of size > n
crit does not provide any further useful information, .i.e. will not uncover any new faults.
We conducted a series of experiments on modules from the C++ Standard Template Library which were seeded with errors. Using a test model appropriate to each module, we generated data sets of sizes up to and exceeding the predicted value of n
crit and verified that after all collections of size \len
crit have been tested, no further errors are discovered.
Data coverage was also compared with statement coverage testing and random test data set generation. The three testing techniques were compared for effectiveness at revealing errors compared to the number of test data sets used. Statement coverage testing was confirmed as the cheapest, in the sense that it produces its maximal effect for the smallest number of tests applied, but least effective technique in terms of numbers of errors uncovered. Data coverage was significantly better than random test generation: it uncovered more faults with fewer tests at every point.