This Article 
 Bibliographic References 
 Add to: 
Predicting Fault Incidence Using Software Change History
July 2000 (vol. 26 no. 7)
pp. 653-661

Abstract—This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.

[1] M.M. Lehman and L.A. Belady, Program Evolution: Processes of Software Change.London: Academic Press, 1985.
[2] J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability. McGraw-Hill, 1990.
[3] J. Jelinski and P.B. Moranda, “Software Reliability Research,” in Probabilistic Models for Software, W. Freiberger, ed., pp. 485-502, Academic Press, 1972.
[4] G.J. Schick and R.W. Wolverton, “An Analysis of Competing Software Reliability Models,” IEEE Trans. Software Eng., vol. 4, no. 2, pp. 104-120, Mar. 1978.
[5] S.N. Mohanty, “Models and Measurements for Quality Assessment of Software,” ACM Computing Surveys, vol. 11, no. 3, pp. 251-275, Sept. 1979.
[6] S.G. Eick, C.R. Loader, M.D. Long, S.A. Vander Wiel, and L.G. Votta, "Estimating Software Fault Content Before Coding," Proc. 14th Int'l Conf. Software Eng., pp. 59-65, May 1992.
[7] D.A. Christenson and S.T. Huang, “Estimating the Fault Content of Software Using the Fix-on-Fix Model,” Bell Labs Technical J., vol. 1, no. 1, pp. 130-137, Summer 1996.
[8] K.H. An, D.A. Gustafson, and A.C. Melton, “A Model for Software Maintenance,” Proc. Conf. Software Maintenance, pp. 57-62, Sept. 1987.
[9] V.R. Basili and B.T. Perricone,“Software errors and complexity: An empirical investigation,” Comm. ACM, vol. 27, no. 1, pp. 42-52, Jan. 1984.
[10] L. Hatton, "Reexamining the Fault Density-Component Size Connection," IEEE Software, Mar. 1997, pp. 89-97.
[11] T.J. Yu, V.Y. Shen, and H.E. Dunsmore, “An Analysis of Several Software Defect Models,” IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1,261–1,270, Sept. 1988.
[12] T.J. McCabe, “A Complexity Measure,” IEEE Trans. Software Eng., vol. 2, no. 4, pp. 308-320, Dec. 1976.
[13] M.H. Halstead, Elements of Software Science. North-Holland, 1977.
[14] N.F. Schneidewind and H.-M. Hoffman, “An Experiment in Software Error Data Collection and Analysis,” IEEE Trans. Software Eng., vol. 5, no. 3, pp. 276-286, May 1979.
[15] N. Ohlsson and H. Alberg, “Predicting Error-Prone Software Modules in Telephone Switches,” IEEE Trans. Software Eng., vol. 22, no. 12, pp. 886–894, Dec. 1996.
[16] V.Y. Shen, T. Yu, S.M. Thebaut, and L.R. Paulsen, “Identifying Error-Prone Software—An Empirical Study,” IEEE Trans. Software Eng., vol 11, no. 4, pp. 317–323, Apr. 1985.
[17] J.C. Munson and T.M. Khoshgoftaar, "Regression Modelling of Software Quality: An Empirical Investigation," Information and Software Technology, vol. 32, no. 2, pp. 106-114, 1990.
[18] M.J. Rochkind, “The Source Code Control System,” IEEE Trans. Software Eng., vol. 1, no. 4, pp. 364-370, Dec. 1975.
[19] P. McCullagh and J.A. Nelder, Generalized Linear Models, second ed. New York: Chapman and Hall, 1989.
[20] H. Zuse,Software Complexity.Berlin: Walter de Gruyter, 1991.
[21] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. New York: Chapman and Hall, 1993.

Index Terms:
Fault potential, code decay, change management data, metrics, statistical analysis, generalized linear models.
Todd L. Graves, Alan F. Karr, J.s. Marron, Harvey Siy, "Predicting Fault Incidence Using Software Change History," IEEE Transactions on Software Engineering, vol. 26, no. 7, pp. 653-661, July 2000, doi:10.1109/32.859533
Usage of this product signifies your acceptance of the Terms of Use.