This Article 
 Bibliographic References 
 Add to: 
Bayesian Analysis of Empirical Software Engineering Cost Models
July/August 1999 (vol. 25 no. 4)
pp. 573-583

Abstract—To date many software engineering cost models have been developed to predict the cost, schedule, and quality of the software under development. But, the rapidly changing nature of software development has made it extremely difficult to develop empirical models that continue to yield high prediction accuracies. Software development costs continue to increase and practitioners continually express their concerns over their inability to accurately predict the costs involved. Thus, one of the most important objectives of the software engineering community has been to develop useful models that constructively explain the software development life-cycle and accurately predict the cost of developing a software product. To that end, many parametric software estimation models have evolved in the last two decades [25], [17], [26], [15], [28], [1], [2], [33], [7], [10], [22], [23].

Almost all of the above mentioned parametric models have been empirically calibrated to actual data from completed software projects. The most commonly used technique for empirical calibration has been the popular classical multiple regression approach. As discussed in this paper, the multiple regression approach imposes a few assumptions frequently violated by software engineering datasets. The source data is also generally imprecise in reporting size, effort, and cost-driver ratings, particularly across different organizations. This results in the development of inaccurate empirical models that don't perform very well when used for prediction. This paper illustrates the problems faced by the multiple regression approach during the calibration of one of the popular software engineering cost models, COCOMO II. It describes the use of a pragmatic 10 percent weighted average approach that was used for the first publicly available calibrated version [6]. It then moves on to show how a more sophisticated Bayesian approach can be used to alleviate some of the problems faced by multiple regression. It compares and contrasts the two empirical approaches, and concludes that the Bayesian approach was better and more robust than the multiple regression approach.

Bayesian analysis is a well-defined and rigorous process of inductive reasoning that has been used in many scientific disciplines (the reader can refer to [11], [35], [3] for a broader understanding of the Bayesian Analysis approach). A distinctive feature of the Bayesian approach is that it permits the investigator to use both sample (data) and prior (expert-judgment) information in a logically consistent manner in making inferences. This is done by using Bayes' theorem to produce a 'postdata' or posterior distribution for the model parameters. Using Bayes' theorem, prior (or initial) values are transformed to postdata views. This transformation can be viewed as a learning process. The posterior distribution is determined by the variances of the prior and sample information. If the variance of the prior information is smaller than the variance of the sampling information, then a higher weight is assigned to the prior information. On the other hand, if the variance of the sample information is smaller than the variance of the prior information, then a higher weight is assigned to the sample information causing the posterior estimate to be closer to the sample information.

The Bayesian approach discussed in this paper enables stronger solutions to one of the biggest problems faced by the software engineering community: the challenge of making good decisions using data that is usually scarce and incomplete. We note that the predictive performance of the Bayesian approach (i.e., within 30 percent of the actuals 75 percent of the time) is significantly better than that of the previous multiple regression approach (i.e., within 30 percent of the actuals only 52 percent of the time) on our latest sample of 161 project datapoints.

[1] B. Boehm, Software Engineering Economics, Prentice Hall, Upper Saddle River, N.J., 1981, pp. 533-535.
[2] B.W. Boehm et al., “Cost Models for Future Software Life Cycle Processes: COCOMO 2.0,” Annals of Software Eng. Special Volume on Software Process and Product Measurement, J.D. Arthur and S.M. Henry, eds., vol. 1, pp. 45-60, Amsterdam, The Netherlands: J.C. Baltzer AG, Science, 1995.
[3] G. Box and G. Tiao, Bayesian Inference in Statistical Analysis. Addison-Wesley, 1973.
[4] L.C. Briand, V.R. Basili, and W.M. Thomas, "A Pattern Recognition Approach for Software Engineering Data Analysis," IEEE Trans. Software Eng., vol. 18, no. 11, pp. 931-942, 1992.
[5] S. Chulani, “Incorporating Bayesian Analysis to Improve the Accuracy of COCOMO II and Its Quality Model Extension,” Qualifying Exam Report, Computer Science Dept., USC Center for Software Eng., Feb. 1998.
[6] S. Chulani, B. Clark, and B. Boehm, “Calibration Results of COCOMOII.1997,” Proc. Int'l Conf. Software Eng., Apr. 1998.
[7] S.D. Conte, Software Engineering Metrics and Models. Menlo Park, Calif.: Benjamin/Cummings, 1986.
[8] D. Cook and S. Weisberg, An Introduction to Regression Graphics. Wiley Series, 1994.
[9] A. Cuelenaere, M. van Genuchten, and F. Heemstra, "Calibrating a Software Cost Estimation Model: Why and How," Information and Software Technology, vol. 20, no. 10, pp. 558-567, 1987.
[10] N.E. Fenton, Software Metrics, A Rigorous Approach. Chapman&Hall, 1991.
[11] A. Gelman, J. Garlin, H. Stern, and D. Rubin, Bayesian Data Analysis. Chapman&Hall, 1995.
[12] O. Helmer, Social Technology. New York: Basic Books, 1966.
[13] International Function Point Users Group (IFPUG), Function Point Counting Practices Manual, Release 4.0, 1994.
[14] D.R. Jeffery and G.C. Low, “Calibrating Estimation Tools for Software Development,” Software Eng. J. vol. 5, pp. 215-221, 1990.
[15] R.W. Jensen, “An Improved Macrolevel Software Development Resource Estimation Model,” Proc. Fifth ISPA Conf., pp. 88-92, Apr. 1983.
[16] E.J. Johnson, “Expertise and Decision Under Uncertainty: Performance and Process,” The Nature of Expertise, Chi, Glaser,and Farr, eds., Lawrence Earlbaum Assoc. 1988.
[17] C. Jones, Applied Software Measurement: Assuring Productivity and Quality, 2nd ed., McGraw-Hill, New York, 1997.
[18] G.G. Judge, W. Griffiths, and R. Carter Hill, Learning and Practicing Econometrics. Wiley, 1993.
[19] C. Kemerer, "An Empirical Validation of Software Cost Estimation Models," Comm. ACM, vol. 30, pp. 416-429, May 1987.
[20] B.A. Kitchenham and N.R. Taylor, “Software Cost Models,” ICL Technical J. vol. 1, May 1984.
[21] E.E. Leamer, Specification Searches, ad hoc Inference with Nonexperimental Data. Wiley Series 1978.
[22] T.F. Masters, “An Overview of Software Cost Estimating at the National Security Agency,” J. Parametrics, vol. 5, no. 1, pp. 72-84, 1985.
[23] S.N. Mohanty, “Software Cost Estimation: Present and Future,” Software Practice and Experience, vol. 11, pp. 103-121, 1981.
[24] G.M. Mullet, “Why Regression Coefficients Have the Wrong Sign,” J. Quality Technology, 1976.
[25] L.H. Putnam and W. Myers, Measures for Excellence. Yourdon Press Computing Series, 1992. http://www.qsm.comslim_estimate.html
[26] R.M. Park et al., “Software Size Measurement: A Framework for Counting Source Statements,” CMU-SEI-92-TR-20, Software Eng. Inst., Pittsburgh, Pa. 1992.
[27] J.S. Poulin, Measuring Software Reuse: Principles, Practices and Economic Models. Addison-Wesley, 1997.
[28] H. Rubin, “ESTIMACS,” IEEE, 1983.
[29] M.J. Shepperd and C. Schofield, “Estimating Software Project Effort Using Analogies,” IEEE Trans. Software Eng., vol. 23, pp. 736-743, 1997.
[30] Center for Software Engineering,“COCOMO II Cost Estimation Questionnaire,”Computer Science Dept., USC Center for Software Eng., 1997. http://sunset.usc.eduCocomo.html
[31] Center for Software Engineering,“COCOMO II Model Definition Manual,”Computer Science Dept., USC Center for Software Eng., 1997. http://sunset.usc.eduCocomo.html
[32] S. Vicinanza, T. Mukhopadhyay, and M. Prietula, “Software Effort Estimation: An Exploratory Study of Expert Performance, Information Systems,” vol. 2, no. 40, pp. 243-262, 1991.
[33] F. Walkerden and D. Ross Jeffery, “Software Cost Estimation: A Review of Models, Process and Practices,” Advances in Computers, 1997.
[34] S. Weisberg, Applied Linear Regression, second ed., New York: John Wiley&Sons, 1985.
[35] “Applications of Bayesian Analysis and Econometrics,” The Statistician, vol. 132, pp. 23-34, 1983.

Index Terms:
Bayesian analysis, multiple regression, software estimation, software engineering cost models, model calibration, prediction accuracy, empirical modeling, COCOMO, measurement, metrics, project management.
Sunita Chulani, Barry Boehm, Bert Steece, "Bayesian Analysis of Empirical Software Engineering Cost Models," IEEE Transactions on Software Engineering, vol. 25, no. 4, pp. 573-583, July-Aug. 1999, doi:10.1109/32.799958
Usage of this product signifies your acceptance of the Terms of Use.