This Article 
 Bibliographic References 
 Add to: 
Machine Learning Approaches to Estimating Software Development Effort
February 1995 (vol. 21 no. 2)
pp. 126-137
Accurate estimation of software development effort is critical in software engineering. Underestimates lead to time pressures that may compromise full functional development and thorough testing of software. In contrast, overestimates can result in noncompetitive contract bids and/or over allocation of development resources and personnel. As a result, many models for estimating software development effort have been proposed. This article describes two methods of machine learning, which we use to build estimators of software development effort from historical data. Our experiments indicate that these techniques are competitive with traditional estimators on one dataset, but also illustrate that these methods are sensitive to the data on which they are trained. This cautionary note applies to any model-construction strategy that relies on historical data. All such models for software effort estimation should be evaluated by exploring model sensitivity on a variety of historical data.

[1] D.W. Aha, D. Kibler, and M.K. Albert, “Instance-Based Learning Algorithms,” Machine Learning, vol. 6, pp. 37-66, 1991.
[2] A. Albrecht and J. Gaffney Jr.,“Software function, source lines of code, and development effort prediction: A software science validation,”IEEE Trans. Software Eng.,vol. 9, pp. 639–648, 1983.
[3] B. Boehm, Software Engineering Economics, Prentice Hall, Upper Saddle River, N.J., 1981, pp. 533-535.
[4] L. Breiman, J. Friedman, R. Olshen, and C. Stone,Classification and Regression Trees.Belmont, CA: Wadsworth International, 1984.
[5] L. Briand, V. Basili, and W. Thomas,“A pattern recognition approach for software engineering data analysis,”IEEE Trans. Software Eng.,vol. 18, pp. 931–942, Nov. 1992.
[6] C. Brodley and E. Rissland,“Measuring concept change,”inAAAI Spring Symp. Training Issues in Incremental Learning,1993, pp. 98–107.
[7] K. DeJong,“Learning with genetic algorithms,”Machine Learning,vol. 3, pp. 121–138, 1988.
[8] B. Evans and D. Fisher,“Overcoming process delays with decision tree induction,”IEEE Expert,vol. 9, pp. 60–66, Feb. 1994.
[9] U. Fayyad,“On the induction of decision trees for multiple concept learning,”Doctoral dissertation, EECS Dep., Univ. of Michigan, 1991.
[10] L. Johnson and R. Riess,Numerical Analysis.Reading, MA: Addison-Wesley, 1982.
[11] C. Kemerer, "An Empirical Validation of Software Cost Estimation Models," Comm. ACM, vol. 30, pp. 416-429, May 1987.
[12] A. Lapedes and R. Farber,“Nonlinear signal prediction using neural networks: Prediction and system modeling,”Los Alamos National Laboratory, 1987, Tech. Rep. LA-UR-87-2662
[13] S. Mohanty,“Software cost estimation: Present and future,”Software—Practice and Experience,vol. 11, pp. 103–121, 1981.
[14] A. Porter and R. Selby,“Empirically-guided software development using metric-based classification trees,”IEEE Software,vol. 7, pp. 46–54, Mar. 1990.
[15] A. Porter and R. Selby, "Evaluating Techniques for Generating Metric-Based Classification Trees," J. Systems Software, vol. 12, pp. 209-218, 1990.
[16] L. H. Putnam,“A general empirical solution to the macro software sizing and estimating problem,”IEEE Trans. Software Eng.,vol. 4, pp. 345–361, 1978.
[17] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[18] J. R. Quinlan,“Combining instance-based and model-based learning,”inProc. the 10th Int. Machine Learning Conf.,1993, pp. 236–243.
[19] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, D.E. Rumelhart and J.L. McClelland et al., eds., chapter 8, pp. 318-362.Cambridge, Mass.: MIT Press, 1986.
[20] W. Scacchi,“Understanding software productivity: Toward a knowledge-based approach,”Int. J. Software Eng. and Knowledge Eng.,vol. 1, pp. 293–320, 1991.
[21] T. J. Sejnowski and C. R. Rosenberg,“Parallel networks that learn to pronounce english text,”Complex Systems,vol. 1, pp. 145–168, 1987.
[22] R. Selby and A. Porter,“Learning from examples: Generation and evaluation of decision trees for software resource analysis,”IEEE Trans. Software Eng.,vol. 14, pp. 1743–1757, 1988.
[23] S. Vicinanza, M. J. Prietulla, and T. Mukhopadhyay,“Case-based reasoning in software effort estimation,”inProc. 11th Int. Conf. Info. Syst.,1990, pp. 149–158.
[24] S. Weiss and C. Kulikowski,Computer Systems that Learn.San Mateo, CA: Morgan Kaufmann, 1991.
[25] J.M. Zurada, Introduction to Artificial Neural Systems. West Publishing Company, 1992.

Index Terms:
Software development effort, machine learning, decision trees, regression trees, and neural networks
Krishnamoorthy Srinivasan, Douglas Fisher, "Machine Learning Approaches to Estimating Software Development Effort," IEEE Transactions on Software Engineering, vol. 21, no. 2, pp. 126-137, Feb. 1995, doi:10.1109/32.345828
Usage of this product signifies your acceptance of the Terms of Use.