The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2009 vol.35)
pp: 293-304
A. Güneş Koru , University of Maryland Baltimore County, Baltimore
Dongsong Zhang , University of Maryland Baltimore County, Baltimore
Khaled El Emam , University of Ottawa, Ottawa
Hongfang Liu , Georgetown University, Washington
ABSTRACT
The importance of the relationship between the size and defect proneness of software modules is well recognized. Understanding the nature of that relationship can facilitate various development decisions related to prioritization of quality assurance activities. Overall, the previous research only drew a general conclusion that there was a monotonically increasing relationship between module size and defect proneness. In this study, we analyzed class-level size and defect data in order to increase our understanding of this crucial relationship. We studied four large-scale object-oriented products, Mozilla, Cn3d, JBoss, and Eclipse. We observed that defect proneness increased as class size increased, but at a slower rate; smaller classes were proportionally more problematic than larger classes. Therefore, practitioners should consider giving higher priority to smaller modules when planning focused quality assurance activities with limited resources. For example, in Mozilla and Eclipse, an inspection strategy investing 80 percent of available resources on 100-LOC classes and the rest on 1,000-LOC classes would be more than twice as cost-effective as the opposite strategy. These results should be immediately useful to guide focused quality-assurance activities in large-scale software projects.
INDEX TERMS
Software science, Product metrics, Planning for SQA and Measurement applied to SQA and Software Quality/SQA, Software Engineering, Software/Software Engin, Open-source software
CITATION
A. Güneş Koru, Dongsong Zhang, Khaled El Emam, Hongfang Liu, "An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules", IEEE Transactions on Software Engineering, vol.35, no. 2, pp. 293-304, March/April 2009, doi:10.1109/TSE.2008.90
REFERENCES
[1] F. Akiyama, “An Example of Software System Debuggings,” Proc. Int'l Federation of Information Processing Societies Congress, vol. 1, pp. 353-359, 1971.
[2] P.K. Andersen, O. Borgan, R.D. Gill, and N. Keiding, Statistical Models Based on Counting Processes. Springer-Verlag, 1993.
[3] P.K. Andersen and R.D. Gill, “Cox's Regression Model for Counting Processes: A Large Sample Study,” Annals of Statistics, vol. 10, pp. 1100-1120, 1982.
[4] C. Andersson and P. Runeson, “A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems,” IEEE Trans. Software Eng., vol. 33, no. 5, pp. 273-286, May 2007.
[5] D.L. Atkins, T. Ball, T.L. Graves, and A. Mockus, “Using Version Control Data to Evaluate the Impact of Software Tools: A Case Study of the Version Editor,” IEEE Trans. Software Eng., vol. 28, no. 7, pp. 625-637, July 2002.
[6] V.R. Basili and B.T. Perricone, “Software Errors and Complexity: An Empirical Investigation,” Comm. ACM, vol. 27, no. 1, pp. 42-52, 1984.
[7] L.C. Briand, V.R. Basili, and C.J. Hetmanski, “Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components,” IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1028-1044, Nov. 1993.
[8] D.N. Card and R.L. Glass, Measuring Software Design Quality. Prentice Hall, 1990.
[9] F. Chayes, Ratio Correlation: A Manual for Students of Petrology and Geochemistry. Univ. of Chicago Press, 1971.
[10] Cn3D Project, Archived by Webcite at http://www.webcitation. org5TuzAf1KC, 2007.
[11] B.T. Compton and C. Withrow, “Prediction and Control of Ada Software Defects,” J. Systems and Software, vol. 12, no. 3, pp. 199-207, 1990.
[12] Concurrent Versions Software, p. Archived by Webcite at http://www.webcitation.org5TuzMgxuG, 2007.
[13] D.R. Cox, “Regression Models and Life Tables,” J. Royal Statistical Soc., vol. 34, pp. 187-220, 1972.
[14] Eclipse Project, Archived by Webcite at http://www.webcitation. org5TuyrW4y4, 2007.
[15] K. El Emam, The ROI from Software Quality. Auerbach Publications, Taylor and Francis Group, LLC, 2005.
[16] K. El Emam, S. Benlarbi, N. Goel, W. Melo, H. Lounis, and S.N. Rai, “The Optimal Class Size for Object-Oriented Software,” IEEE Trans. Software Eng., vol. 28, no. 5, pp. 494-509, May 2002.
[17] K. El Emam, S. Benlarbi, N. Goel, and S.N. Rai, “The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics,” IEEE Trans. Software Eng., vol. 27, no. 7, pp. 630-650, July 2001.
[18] K. El Emam, W. Melo, and J.C. Machado, “The Prediction of Faulty Classes Using Object-Oriented Design Metrics,” J. Systems and Software, vol. 56, no. 1, pp. 63-75, 2001.
[19] N.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. Software Eng., vol. 25, no. 5, pp.675-689, Sept./Oct. 1999.
[20] N.E. Fenton and N. Ohlsson, “Quantitative Analysis of Faults and Failures in a Complex Software System,” IEEE Trans. Software Eng., vol. 26, no. 8, pp. 797-814, Aug. 2000.
[21] Y. Funami and M.H. Halstead, “A Software Physics Analysis of Akiyama's Debugging Data,” Proc. MRI XXIV Int'l Symp. Computer Software Eng., pp. 133-138, 1976.
[22] J.E. Gaffney, “Estimating the Number of Faults in Code,” IEEE Trans. Software Eng., vol. 10, no. 4, pp. 459-465, 1984.
[23] T. Gyimothy, R. Ferenc, and I. Siket, “Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Trans. Software Eng., vol. 31, no. 10, pp. 897-910, Oct. 2005.
[24] M.H. Halstead, Elements of Software Science. Elsevier, 1977.
[25] P.G. Hamer and G.D. Frewin, “M.H. Halstead's Software Science —A Critical Examination,” Proc. Sixth Int'l Conf. Software Eng., pp.197-206, 1982.
[26] F.E. Harrell, Regression Modeling Strategies: With Applications to Linear Modes, Logistic Regression, and Survival Analysis. Springer-Verlag, 2001.
[27] L. Hatton, “Reexamining the Fault Density-Component Size Connection,” IEEE Software, vol. 14, no. 2, pp. 89-97, Apr. 1997.
[28] L. Hatton, “Does OO Sync with the Way We Think,” IEEE Software, vol. 15, no. 3, pp. 46-54, May/June 1998.
[29] D.W. Hosmer and S. Lemeshow, Applied Survival Analysis: Regression Modeling of Time to Event Data. John Wiley & Sons, 1999.
[30] JBoss Project, Archived by Webcite at http://www.webcitation. org5TuyyIF4R, 2007.
[31] T.M. Khoshgoftaar, E.B. Allen, J. Hudepohl, and S. Aud, “Applications of Neural Networks to Software Quality Modeling of a Very Large Telecommunications System,” IEEE Trans. Neural Networks, vol. 8, no. 4, pp. 902-909, 1997.
[32] T.M. Khoshgoftaar and R.M. Szabo, “Using Neural Networks to Predict Software Faults during Testing,” IEEE Trans. Reliability, vol. 45, no. 3, pp. 456-462, 1996.
[33] A.G. Koru and J. Tian, “An Empirical Comparison and Characterization of High Defect and High Complexity Modules,” J.Systems and Software, vol. 67, no. 3, pp. 153-163, 2003.
[34] A.G. Koru and J. Tian, “Defect Handling in Medium and Large Open Source Projects,” IEEE Software, vol. 21, no. 4, pp. 54-61, July/Aug. 2004.
[35] A.G. Koru and J. Tian, “Comparing High Change Modules and Modules with the Highest Measurement Values in Two Large-Scale Open-Source Products,” IEEE Trans. Software Eng., vol. 31, no. 8, pp. 625-642, Aug. 2005.
[36] A.G. Koru, D. Zhang, and H. Liu, “Modeling the Effect of Size on Defect Proneness for Open-Source Software,” Proc. Third Int'l Workshop Predictor Models in Software Eng., 2007.
[37] J.-C. Laprie and K. Kanoon, “Software Reliability and System Reliability,” Handbook of Software Reliability Engineering, M.R. Lyu, ed., vol. 1, pp. 27-69, IEEE CS Press-McGraw Hill, 1996.
[38] M. Lipow, “Number of Faults per Line of Code,” IEEE Trans. Software Eng., vol. 8, no. 4, pp. 437-439, 1982.
[39] T.J. McCabe, “A Complexity Measure,” IEEE Trans. Software Eng., vol. 2, no. 6, pp. 308-320, 1976.
[40] A. Mockus, R.T. Fielding, and J. Herbsleb, “A Case Study of Open Source Software Development: The Apache Server,” Proc. 22nd Int'l Conf. Software Eng., pp. 263-272, 2000.
[41] A. Mockus, R.T. Fielding, and J. Herbsleb, “Two Case Studies of Open Source Software Development: Apache and Mozilla,” ACM Trans. Software Eng. and Methodology, vol. 11, no. 3, pp. 309-346, 2002.
[42] K. Moller and D. Paulish, “An Empirical Investigation of Software Fault Distribution,” Proc. First Int'l Software Metrics Symp., pp. 82-90, May 1993.
[43] Mozilla Project, Archived by Webcite at http://www.webcitation.org5RqqbCKKm, 2007.
[44] J.C. Munson and T.M. Khoshgoftaar, “The Detection of Fault-Prone Programs,” IEEE Trans. Software Eng., vol. 18, no. 5, pp. 423-433, May 1992.
[45] L.M. Ottenstein, “Quantitative Estimates of Debugging Requirements,” IEEE Trans. Software Eng., vol. 5, no. 5, pp. 504-514, 1979.
[46] A.A. Porter and R.W. Selby, “Empirically Guided Software Development Using Metric-Based Classification Trees,” IEEE Software, vol. 7, no. 2, pp. 46-54, Apr. 1990.
[47] E.S. Raymond, The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O'Reilly and Assoc., 1999.
[48] J. Rosenberg, “Some Misconceptions about Lines of Code,” Proc. Fourth Int'l Symp. Software Metrics, pp. 137-142, 1997.
[49] M. Schemper and J. Stare, “Explained Variation in Survival Analysis,” Statistics in Medicine, vol. 15, pp. 1999-2012, 1996.
[50] N.F. Schneidewind and H.-M. Hoffmann, “An Experiment in Software Error Data Collection and Analysis,” IEEE Trans. Software Eng., vol. 5, no. 3, pp. 276-285, 1978.
[51] Understand for C++: User Guide and Reference Manual, I. Scientific Toolworks, Jan. 2003.
[52] R.W. Selby and V.R. Basili, “Analyzing Error-Prone System Structure,” IEEE Trans. Software Eng., vol. 17, no. 2, pp. 141-152, Feb. 1991.
[53] V.Y. Shen, T.J. Yu, S.M. Thebaut, and L. Paulsen, “Identifying Error-Prone Software—An Empirical Study,” IEEE Trans. Software Eng., vol. 11, no. 4, pp. 317-324, 1985.
[54] Subversion, Archived by Webcite at http://www.webcitation. org5U90RHRqb, 2007.
[55] R. Thayer, M. Lipow, and E. Nelson, Software Reliability. North-Holland, 1978.
[56] T.M. Therneau and P.M. Grambsch, Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
[57] T.M. Therneau, P.M. Grambsch, and T.R. Fleming, “Martingale Based Residuals for Survival Models,” Biometrika, vol. 77, pp. 147-160, 1990.
[58] J. Tian, “Quality Assurance Alternatives and Techniques: A Defect-Based Survey and Analysis,” Software Quality Professional, vol. 3, no. 3, pp. 6-18, 2001.
[59] J. Tian, Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement. John Wiley & Sons, 2005.
[60] J. Troster and J. Tian, “Measurement and Defect Modeling for a Legacy Software System,” Annals of Software Eng., vol. 1, pp. 95-118, 1995.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool