This Article 
 Bibliographic References 
 Add to: 
Quantitative Analysis of Faults and Failures in a Complex Software System
August 2000 (vol. 26 no. 8)
pp. 797-814

Abstract—The dearth of published empirical data on major industrial systems has been one of the reasons that software engineering has failed to establish a proper scientific basis. In this paper, we hope to provide a small contribution to the body of empirical knowledge. We describe a number of results from a quantitative study of faults and failures in two releases of a major commercial system. We tested a range of basic software engineering hypotheses relating to: The Pareto principle of distribution of faults and failures; the use of early fault data to predict later fault and failure data; metrics for fault prediction; and benchmarking fault data. For example, we found strong evidence that a small number of modules contain most of the faults discovered in prerelease testing and that a very small number of modules contain most of the faults discovered in operation. However, in neither case is this explained by the size or complexity of the modules. We found no evidence to support previous claims relating module size to fault density nor did we find evidence that popular complexity metrics are good predictors of either fault-prone or failure-prone modules. We confirmed that the number of faults discovered in prerelease testing is an order of magnitude greater than the number discovered in 12 months of operational use. We also discovered fairly stable numbers of faults discovered at corresponding testing phases. Our most surprising and important result was strong evidence of a counter-intuitive relationship between pre- and postrelease faults: Those modules which are the most fault-prone prerelease are among the least fault-prone postrelease, while conversely, the modules which are most fault-prone postrelease are among the least fault-prone prerelease. This observation has serious ramifications for the commonly used fault density measure. Not only is it misleading to use it as a surrogate quality measure, but, its previous extensive use in metrics studies is shown to be flawed. Our results provide data-points in building up an empirical picture of the software development process. However, even the strong results we have observed are not generally valid as software engineering laws because they fail to take account of basic explanatory data, notably testing effort and operational usage. After all, a module which has not been tested or used will reveal no faults, irrespective of its size, complexity, or any other factor.

[1] E. Adams, “Optimizing Preventive Service of Software Products,” IBM Research J., vol. 28, no. 1, pp. 2–14, 1984.
[2] W.W. Agresti and W.M. Evanco, "Projecting Software Defects from Analyzing Ada Designs," IEEE Trans. Software Eng., Nov. 1992, pp. 988-997.
[3] V.R. Basili and B.T. Perricone,“Software errors and complexity: An empirical investigation,” Comm. ACM, vol. 27, no. 1, pp. 42-52, Jan. 1984.
[4] D.W. Carman, A.A. Dolinsky, M.R. Lyu, and J.S. Yu, “Software Reliability Engineering Study of a Large-Scale Telecommunications System,” Proc. Sixth Int'l Symp. Software Reliability Eng., pp. 350–359, 1995.
[5] D.A. Christenson and S.T. Huang, “Estimating the Fault Content of Software Suing the Fix-on-Fix Model,” Bell Labs Technical J., vol. 1, no. 1, pp. 130–137, 1996.
[6] T.B. Compton and C. Withrow, “Prediction and Control of ADA Software Defects,” J. Systems Software, vol. 12, pp. 199–207, 1990.
[7] M.K. Daskalantonakis, "A Practical View of Software Management and Implementation Experiences within Motorola," IEEE Trans. Software Eng. Vol. 18, No. 11, 1992, pp. 998-1009.
[8] M. Dyer, The Cleanroom Approach to Quality Software Development.New York: John Wiley&Sons, 1992.
[9] C. Ebert and T. Liedtke, “An Integrated Approach to Criticality Prediction,” Proc. Sixth Int'l Symp. Software Reliability Eng., pp. 14–23, 1995.
[10] S.G. Eick, C.R. Loader, M.D. Long, S.A. Vander Wiel, and L.G. Votta, "Estimating Software Fault Content Before Coding," Proc. 14th Int'l Conf. Software Eng., pp. 59-65, May 1992.
[11] N. Fenton, "Software Measurement: A Necessary Scientific Bias," IEEE Trans. Software Eng., vol. 20, pp. 199-206, Mar. 1994.
[12] N.E. Fenton and B.A. Kitchenham, “Validating Software Measures,” J. Software Testing, Verification, and Reliability, vol. 1, no. 2, pp. 27–42, 1991.
[13] N. Fenton and L. Pfleeger, Software Metrics–A Rigorous and Practical Approach, second ed. Boston, PWS-Publishing, 1997.
[14] R. Gibson, Managing Computer Projects. London: Prentice Hall, 1992.
[15] R.B. Grady, Practical Software Metrics for Project Management and Process Improvement, Prentice Hall, Englewood Cliffs, N.J., 1992.
[16] L. Hatton, “Static Inspection: Tapping the Wheels of Software,” IEEE Trans. Software Eng., pp. 85–87, vol 21, no. 5, May 1995.
[17] L. Hatton, "Software Failures—Follies and Fallacies," IEE Rev., Vol. 43, No. 2, 1997, pp. 49-54.
[18] U. Heitkoetter, B. Helling, H. Nolte, and M. Kelly, "Design Metrics and Aids to their Automatic collection," Information and Software Technology, vol. 32, pp. 79-87, Jan. 1990.
[19] S. Henry and D. Kafura, “Software Structure Metrics Based on Information Flow,” IEEE Trans. Software Eng., vol. 7, no. 5, pp. 510–518, 1981.
[20] K. Kaaniche and K. Kanoun, “Reliability of a Telecommunications System,” Proc. Seventh Int'l Symp. Software Reliability Eng., pp. 207–212, 1996.
[21] K. Kaaniche, K. Kanoun, M. Cukier, and M.M. Bastos, “Software Reliability Analysis of Three Successive Generations of a Switching System,” Proc. First European Conf. Dependable Computing (EDCC-1), pp. 473–490, 1994.
[22] K. Kanoun and T. Sabourin, “Software Dependability of a Telephone Switching System,” Proc. 17th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS-17), pp. 236–241, 1987.
[23] K. Kanoun, M. Kaâniche, and J.-C. Laprie, "Experience in Software Reliability: From Data Collection to Quantitative Evaluation," 4th Int'l Symp. Software Reliability Eng., IEEE CS Press, Los Alamitos, Calif., 1993, pp. 234-245.
[24] G.Q. Kenney and M.A. Vouk, “Measuring the Field Quality of Wide-Distribution Commerical Software,” Proc. Third Int'l Symp Software Reliability Eng., pp. 351–357, 1992.
[25] T.M. Khoshgoftaar et al., "Early Quality Prediction: A Case Study in Telecommunications," IEEE Software, Jan. 1996, pp. 65-71.
[26] B.A. Kitchenham, A.P. Kitchenham, and J.P. Fellows, “The Effects of Inspections on Software Quality and Productivity,” ICL Technical J, pp. 112–22, May 1986.
[27] B.A. Kitchenham, L.M. Pickard, and S.J. Linkman, “An Evaluation of Some Design Metrics,” Software Eng. J., vol 5, no. 1, pp. 50–58, 1990.
[28] Y. Levendel, "Reliability Analysis of Large Software Systems: Defects Data Modeling," IEEE Trans. Software Eng., Vol. 16, No. 2, 1990, pp. 141-52.
[29] T. McCabe, “A Software Complexity Measure,” IEEE Trans. Software Eng., vol. 2, no. 4, pp. 308–320, 1976.
[30] K.-H. Moller and D. Paulish, “An Empirical Investigation of Software Fault Distribution,” Software Quality Assurance and Measurement, N.E. Fenton, R.W. Whitty, and Y. Iizuka, eds., pp. 242–253, Int'l Thomson Computer Press, 1995.
[31] J.C. Munson and T.M. Khoshgoftaar, "The Detection of Fault-Prone Programs," IEEE Trans. Software Eng., vol. 18, May 1992.
[32] J.D. Musa, "Operational Profiles in Software Reliability Engineering," IEEE Software, vol. 10, no. 2, pp. 14-32, 1993.
[33] M. Neil and N.E. Fenton, “Predicting Software Quality Using Bayesian Belief Networks,” Proc 21st Ann. Software Eng. Workshop, pp. 217–230, Dec. 1996.
[34] N. Ohlsson, “Predicting Error-Prone Software Modules in Telephone Switches.” Masters thesis, Linköping Univ., Sweden, 1993.
[35] N. Ohlsson and H. Alberg, “Predicting Error-Prone Software Modules in Telephone Switches,” IEEE Trans. Software Eng., vol. 22, no. 12, pp. 886–894, Dec. 1996.
[36] N. Ohlsson, M. Zhao, and M. Helander, “Application of Multivariate Analysis for Software Fault Prediction,” Software Quality J., vol. 7, no. 1, pp. 51–66, 1998.
[37] S.L. Pfleeger and L. Hatton, "Investigating the Influence of Formal Methods," Computer, Feb. 1997, pp. 33-43.
[38] V.Y. Shen, T. Yu, S.M. Thebaut, and L.R. Paulsen, “Identifying Error-Prone Software—An Empirical Study,” IEEE Trans. Software Eng., vol 11, no. 4, pp. 317–323, Apr. 1985.
[39] Using Formal Description Techniques: An Introduction to Estelle, Lotos, and SDL, K.J. Turner, ed., John Wiley&Sons, 1993.
[40] I. Vessey and R. Weber, “Research on Structured Programming: An Empiricist's Evaluation,” IEEE Trans. Software Eng., vol. 10, no. 7, pp. 397–407, July 1984.
[41] K. Yasuda and K. Koga, “Product Development and Quality in the Software Factory,” Software Quality Assurance and Metrics: A Worldwide Prespective, N.E. Fenton, R.W. Whitty, and Y. Iizuka eds., pp. 195–205, Int'l Thomson Press, 1995.
[42] T.J. Yu, V.Y. Shen, and H.E. Dunsmore, “An Analysis of Several Software Defect Models,” IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1,261–1,270, Sept. 1988.
[43] H. Zuse,Software Complexity.Berlin: Walter de Gruyter, 1991.
[44] J.M. Juran, F.M. Gryna, Jr., and F.M. Bingham, Quality Control Handbook. Third edition, McGraw Hill, New York, 1979.
[45] G.G. Schulmeyer and J.I. McManus, Handbook for Software Quality Assurance. G.G. Schulmeyer and J.I. McManus eds., Van Nostrand Rheinhold, New York, 1987.
[46] N.O.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. Software Eng., vol. 25, no. 5, pp. 675-689, Sept./Oct. 1999.

Index Terms:
Software faults and failures, software metrics, empirical studies.
Norman E. Fenton, Niclas Ohlsson, "Quantitative Analysis of Faults and Failures in a Complex Software System," IEEE Transactions on Software Engineering, vol. 26, no. 8, pp. 797-814, Aug. 2000, doi:10.1109/32.879815
Usage of this product signifies your acceptance of the Terms of Use.