This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics
July 2001 (vol. 27 no. 7)
pp. 630-650

Abstract—Much effort has been devoted to the development and empirical validation of object-oriented metrics. The empirical validations performed thus far would suggest that a core set of validated metrics is close to being identified. However, none of these studies allow for the potentially confounding effect of class size. In this paper, we demonstrate a strong size confounding effect and question the results of previous object-oriented metrics validation studies. We first investigated whether there is a confounding effect of class size in validation studies of object-oriented metrics and show that, based on previous work, there is reason to believe that such an effect exists. We then describe a detailed empirical methodology for identifying those effects. Finally, we perform a study on a large C++ telecommunications framework to examine if size is really a confounder. This study considered the Chidamber and Kemerer metrics and a subset of the Lorenz and Kidd metrics. The dependent variable was the incidence of a fault attributable to a field failure (fault-proneness of a class). Our findings indicate that, before controlling for size, the results are very similar to previous studies: The metrics that are expected to be validated are indeed associated with fault-proneness. After controlling for size, none of the metrics we studied were associated with fault-proneness anymore. This demonstrates a strong size confounding effect and casts doubt on the results of previous object-oriented metrics validation studies. It is recommended that previous validation studies be reexamined to determine whether their conclusions would still hold after controlling for size and that future validation studies should always control for size.

[1] M.A. de Almeida, H. Lounis, and W.L. Melo, “An Investigation on the Use of Machine Learned Models for Estimating Correction Costs,” Proc. IEEE Int'l Conf. Software Eng., 1998.
[2] A.L. Baker, J.M. Bieman, N. Fenton, D.A. Gustafson, A. Melton, and R. Whitty, “Philosophy for Software Measurement,” J. System Software, vol. 12, no. 3, pp. 277–281, 1990.
[3] V. Basili, S. Condon, K. El Emam, R. Hendrick, and W. Melo, “Characterizing and Modeling the Cost of Rework in a Library of Reusable Software Components,” Proc. 19th Int'l Conf. Software Eng., 1997.
[4] V.R. Basili, L.C. Briand, and W. Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators," IEEE Trans. Software Eng., Oct. 1996, pp. 751-761.
[5] D. Belsley, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley and Sons, 1980.
[6] D. Belsley, Conditioning Diagnostics: Collinearity and Weak Data in Regression. John Wiley and Sons, 1991.
[7] D. Belsley, “A Guide to Using the Collinearity Diagnostics,” Computer Science in Economics and Management, vol. 4, pp. 33-50, 1991.
[8] S. Benlarbi and W. Melo, “Polymorphism Measures for Early Risk Prediction,” Proc. 21st Int'l Conf. Software Eng., ICSE '99, pp. 335-344, 1999.
[9] A. Binkley and S. Schach, “Prediction of Run-Time Failures Using Static Product Quality Metrics,” Software Quality J., vol. 7, pp. 141-147, 1998.
[10] A. Binkley and S. Schach, “Validation of the Coupling Dependency Metric as a Predictor of Run-Time Failures and Maintenance Measures,” Proc. 20th Int'l Conf. Software Eng., 1998.
[11] D. Boehm-Davis, R. Holt, and A. Schultz, “The Role of Program Structure in Software Maintenance,” Int'l J. Man-Machine Studies, vol. 36, pp. 21-63, 1992.
[12] N. Breslow and N. Day, Statistical Methods in Cancer Research—The Analysis of Case Control Studies, vol. 1. Int'l Agency for Research on Cancer, 1980.
[13] L. Briand, W. Thomas, and C. Hetmanski, “Modeling and Managing Risk Early in Software Development,” Proc. Int'l Conf. Software Eng., pp. 55-65, 1993.
[14] L.C. Briand,V.R. Basili,, and C.J. Hetmanski,“Developing interpretable models with optimized set reduction for identifying high-risk software components,” IEEE Transactions on Software Engineering, vol. 19, no. 11, pp. 1,028-1,044, Nov. 1993.
[15] L. Briand, K. El Emam, and S. Morasca, “Theoretical and Empirical Validation of Software Product Measures,” Technical Report ISERN-95-03, Int'l Software Eng. Research Network, 1995.
[16] L. Briand, J. Daly, and J. Wüst, "A Unified Framework for Cohesion Measurement in Object-Oriented Systems," Empirical Software Eng.: An Int'l J., vol. 3, no. 1, pp. 65-117, 1998.
[17] L. Briand, P. Devanbu, and W. Melo, "An Investigation into Coupling Measures for C++," Proc. 19th Int'l Conf. Software Eng., ICSE '97,Boston, pp. 412-421, May 1997.
[18] L. Briand, J. Daly, V. Porter, and J. Wüst, "Predicting Fault-Prone Classes with Design Measures in Object-Oriented Systems," Proc. Ninth Int'l Symp. Software Reliability Eng., ISSRE'98,Paderborn, Germany, Nov. 1998.
[19] L. Briand, J. Wuest, S. Ikonomovski, and H. Lounis, “A Comprehensive Investigation of Quality Factors in Object-Oriented Designs: An Industrial Case Study,” Technical Report ISERN-98-29, Int'l Software Eng. Research Network, 1998.
[20] L. Briand, J. Daly, and J. Wüst, "A Unified Framework for Cohesion Measurement in Object-Oriented Systems," Empirical Software Eng.: An Int'l J., vol. 3, no. 1, pp. 65-117, 1998.
[21] L. Briand, J. Daly, and J. Wuest, A Unified Framework for Coupling Measurement in Object-Oriented Systems IEEE Trans. Software Eng., vol. 25, no. 1, pp. 91-121, 1999.
[22] L. Briand, J. Wuest, J. Daly, and V. Porter, “Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems,” J. Systems and Software, vol. 51, 2000.
[23] L. Briand, E. Arisholm, S. Counsell, F. Houdek, and P. Thevenod-Fosse, “Empirical Studies of Object-Oriented Artifacts, Methods, and Processes: State of the Art and Future Direction,” Empirical Software Eng., vol. 4, no. 4, pp. 387-404, 1999.
[24] F. Brite e Abreu and R. Carapuca, “Object-Oriented Software Engineering: Measuring and Controlling the Development Process,” Proc. Fourth Int'l Conf. Software Quality, 1994.
[25] F. Brito e Abreu and W. Melo, “Evaluating the Impact of OO Design on Software Quality,” Proc. Third Int'l Software Metrics Symp., 1996.
[26] M. Cartwright, “An Empirical View of Inheritance,” Information and Software Technology, vol. 40, pp. 795-799, 1998.
[27] M. Cartwright and M. Shepperd, "An Empirical Investigation of an Object-Oriented Software System," IEEE Trans. Software Eng., vol. 26, no. 8, Aug. 2000, pp. 786-796.
[28] S. Chaterjee and B. Price, Regression Analysis by Example. John Wiley and Sons, 1991.
[29] S.R. Chidamber and C.F. Kemerer, "Towards a Metrics Suite for Object Oriented Design," A. Paepcke, ed., Proc. Conf. Object-Oriented Programming: Systems, Languages and Applications, OOPSLA'91, Oct. 1991. Also published in SIGPLAN Notices, vol. 26, no. 11, pp. 197-211, 1991.
[30] S.R. Chidamber and C.F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Trans. Software Eng., vol. 20, no. 6, pp. 476-493, 1994.
[31] S. Chidamber and C. Kemerer, “Authors' Reply,” IEEE Trans. Software Eng., vol. 21, no. 3, p. 265, Mar. 1995.
[32] S. Chidamber, D. Darcy, and C. Kemerer, “Managerial use of Metrics for Object-Oriented Software: An Exploratory Analysis,” IEEE Trans. Software Eng., vol. 24, no. 8, pp. 629-639, Aug. 1998.
[33] N.I. Churcher and M.J. Shepperd, "Comments on 'A Metrics Suite for Object-Oriented Design,'" IEEE Trans. Software Eng., vol. 21, no. 3, pp. 263-265, 1995.
[34] F. Coallier, J. Mayrand, and B. Lague, “Risk Management in Software Product Procurement,” Elements of Software Process Assessment and Improvement, K. El Emam and N.H. Madhavji, eds., 1999.
[35] R. Cook, “Detection of Influential Observations in Linear Regression,” Technometrics, vol. 19, pp. 15-18, 1977.
[36] R. Cook, “Influential Observations in Linear Regression,” J. Amer. Statistical Assoc., vol. 74, pp. 169-174, 1979.
[37] D. Cox and N. Wermuth, “A Comment on the Coefficient of Determination for Binary Responses,” The Amer. Statistician, vol. 46, pp. 1-4, 1992.
[38] L. Dales and H. Ury, “An Improper Use of Statistical Significance Testing in Studying Covariables,” Int'l J. Epidemiology, vol. 7, no. 4, pp. 373-375, 1978.
[39] J. Daly, J. Miller, A. Brooks, M. Roper, and M. Wood, “Issues on the Object-Oriented Paradigm: A Questionnaire Survey,” Research Report EFoCS-8-95, Dept. of Computer Science, Univ. of Strathclyde, 1995.
[40] J. Daly, M. Wood, A. Brooks, J. Miller, and M. Roper, “Structured Interviews on the Object-Oriented Paradigm,” Research Report EFoCS-7-95, Dept. of Computer Science, Univ. of Strathclyde, 1995.
[41] J. Daly, A. Brooks, J. Miller, M. Roper, and M. Wood, “Evaluating Inheritance Depth on the Maintainability of Object-Oriented Software,” Empirical Software Eng., vol. 1, no. 2, pp. 109-132, 1996.
[42] C. Davies, J. Hyde, S. Bangdiwala, and J. Nelson, “An Example of Dependencies Among Variables in a Conditional Logistic Regression,” Modern Statistical Methods in Chronic Disease Edpidemiology, S. Moolgavkar and R. Prentice, eds., John Wiley and Sons, 1986.
[43] I. Deligiannis and M. Shepperd, “A Review of Experimental Investigations into Object-Oriented Technology,” Proc. Fifth IEEE Workshop Empirical Studies of Software Maintenance, pp. 6-10, 1999.
[44] S. Derksen and H. Keselman, “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables,” British J. Mathematical and Statistical Psychology, vol. 45, pp. 265-282, 1992.
[45] J. Dvorak, “Conceptual Entropy and its Effect on Class Hierarchies,” Computer, pp. 59–63, 1994.
[46] C. Ebert and T. Liedtke, “An Integrated Approach to Criticality Prediction,” Proc. Sixth Int'l Symp. Software Reliability Eng., pp. 14–23, 1995.
[47] B. Everitt, The Analysis of Contingency Tables. Chapman and Hall, 1992.
[48] N. Fenton, “Software Metrics: Theory, Tools and Validation,” Software Eng. J., pp. 65-78, Jan. 1990.
[49] N.O.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. Software Eng., vol. 25, no. 5, pp. 675-689, Sept./Oct. 1999.
[50] N. Fenton and M. Neil, “Software Metrics: Successes, Failures, and New Directions,” J. Systems and Software, vol. 47, pp. 149-157, 1999.
[51] N. Fenton and N. Ohlsson, Quantitative Analysis of Faults and Failures in a Complex Software System IEEE Trans. Software Eng., vol. 26, no. 8, pp. 797-814, Aug. 2000.
[52] L. Gordis, Epidemiology. W.B. Saunders, 1996.
[53] F. Harrell and K. Lee, “Regression Modelling Strategies for Improven Prognostic Prediction,” Statistics in Medicine, vol. 3, pp. 143-152, 1984.
[54] F. Harrell, K. Lee, and D. Mark, “Multivariate Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors,” Statistics in Medicine, vol. 15, pp. 361-387, 1996.
[55] R. Harrison, L. Samaraweera, M. Dobie, and P. Lewis, “An Evaluation of Code Metrics for Object-Oriented Programs,” Information and Software Technology, vol. 38, pp. 443-450, 1996.
[56] W. Harrison, “Using Software Metrics to Allocate Testing Resources,” J. Management Information Systems, vol. 4, no. 4, pp. 93-105, 1988.
[57] R. Harrison, S. Counsell, and R. Nithi, “Coupling Metrics for Object Oriented Design,” Proc. Fifth Metrics Symp., pp. 150-157, Nov. 1998.
[58] L. Hatton, “Is Modularization Always a Good Idea?” Information and Software Technology, vol. 38, pp. 719-721, 1996.
[59] L. Hatton, "Does OO Sync with How We Think?" IEEE Software, May/June 1998, pp. 46-54.
[60] B. Henderson-Sellers, Software Metrics. Prentice-Hall, 1996.
[61] D. Hosmer and S. Lemeshow, Applied Logistic Regression. John Wiley and Sons, 1989.
[62] J.P. Hudepohl et al., "Emerald: Software Metrics and Models on the Desktop," IEEE Software, Vol. 13, No. 5, Sept. 1996, pp. 56-60.
[63] J. Hudepohl, S. Aud, T. Khoshgoftaar, E. Allen, and J. Mayrand, “Integrating Metrics and Models for Software Risk Assessment,” Proc. Seventh Int'l Symp. Software Reliability Eng., pp. 93-98, 1996.
[64] ISO/IEC 9126, Information Technology—Software Product Evaluation—Quality Characteristics and Guidelines for their Use. Int'l Organization for Standardization and the Int'l Electrotechnical Commission, 1991.
[65] ISO/IEC DIS 14598-1, Information Technology—Software Product Evaluation; Part 1: Overview. Int'l Organization for Standardization and the Int'l Electrotechnical Commission, 1996.
[66] M. Jørgensen, “Experience with the Accuracy of Software Maintenance Task Effort Prediction Models,” IEEE Trans. Software Eng., vol. 21, no. 8, pp. 674–681, Aug. 1995.
[67] K. Kaaniche and K. Kanoun, “Reliability of a Telecommunications System,” Proc. Seventh Int'l Symp. Software Reliability Eng., pp. 207–212, 1996.
[68] J. Kearney, R. Sedlmeyer, W. Thompson, M. Gray, and M. Adler, “Software Complexity Measurement,” Comm. ACM, vol. 29, no. 11, pp. 1044-1050, 1986.
[69] T. Khoshgoftaar, E. Allen, K. Kalaichelvan, and N. Goel, “The Impact of Software Evolution and Reuse on Software Quality,” Empirical Software Eng.: An Int'l J., vol. 1, pp. 31-44, 1996.
[70] T. Khoshgoftaar, E. Allen, R. Halstead, G. Trio, and R. Flass, “Process Measures for Predicting Software Quality,” Computer, vol. 31, no. 4, pp. 66-72, Apr. 1998.
[71] T. Khoshgoftaar, E. Allen, W. Jones, and J. Hudepohl, “Which Software Modules Have Faults that will be Discovered by Customers?” J. Software Maintenance: Research and Practice, vol. 11, no. 1, pp. 1-18, 1999.
[72] T. Khoshgoftaar, E. Allen, W. Jones, and J. Hudepohl, “Classification Tree Models of Software Quality Over Multiple Releases,” Proc. Int'l Symp. Software Reliability Eng., pp. 116-125, 1999.
[73] B.A. Kitchenham, S.L. Pfleeger, and N. Fenton, “Towards a Framework for Software Measurement Validation,” IEEE Trans. Software Eng., vol. 21, no. 12, pp. 929-944, Dec. 1995.
[74] D. Kleinbaum, L. Kupper, and H. Morgenstern, Epidemiologic Research: Principles and Quantitative Methods. Van Nostrand Reinhold, 1982.
[75] F. Lanubile and G. Visaggio, “Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned,” J. Systems and Software, vol. 38, pp. 225-234, 1997.
[76] Y. Lee, B. Liang, S. Wu, and F. Wang, “Measuring the Coupling and Cohesion of an Object-Oriented Program Based on Information Flow,” Proc. Int'l Conf. Software Quality, 1995.
[77] M. Lejter, S. Meyers, and S.P. Reiss, "Support for Maintaining Object-Oriented Programs," IEEE Trans. Software Eng., vol. 18, no. 12, pp. 1,045-1,052, Dec. 1992.
[78] W. Li and S. Henry, "Object-Oriented Metrics that Predict Maintainability," J. Systems Software, Vol. 23, No. 2, 1993, pp. 111-122.
[79] R. Lindsay and A. Ehrenberg, “The Design of Replicated Studies,” The Amer. Statistician, vol. 47, no. 3, pp. 217-228, 1993.
[80] M. Lorenz and J. Kidd, Object-Oriented Software Metrics. Prentice Hall, 1994.
[81] M. Lyu, J. Yu, E. Keramidas, and S. Dalal, “Armor: Analyzer for Reducing Module Operational Risk,” Proc. 25th Int'l Symp. Fault-Tolerant Computing, pp. 137-142, June 1995.
[82] G. Maddala, Limited-Dependent and Qualitative Variables in Econometrics. Cambridge Univ. Press, 1983.
[83] R. Mason and R. Gunst, “Outlier-Induced Collinearities,” Technometrics, vol. 27, pp. 401-407, 1985.
[84] S. Menard, Applied Logistic Regression Analysis. Sage Publications, 1995.
[85] S. Menard, “Coefficients of Determination for Multiple Logistic Regression Analysis,” The Amer. Statistician, vol. 54, no. 1, pp. 17-24, 2000.
[86] K.-H. Moller and D. Paulish, “An Empirical Investigation of Software Fault Distribution,” Proc. First Int'l Software Metrics Symp., 1993.
[87] J.C. Munson and T.M. Khoshgoftaar, "The Detection of Fault-Prone Programs," IEEE Trans. Software Eng., vol. 18, May 1992.
[88] N. Nagelkerke, “A Note on a General Definition of the Coefficient of Determination,” Biometrika, vol. 78, no. 3, pp. 691-692, 1991.
[89] P. Nesi and T. Querci, "Effort Estimation and Prediction of Object-Oriented Systems," J. Systems and Software, to appear, 1998.
[90] J. Neter, W. Wasserman, and M. Kunter, Applied Linear Statistical Models. Irwin, 1990.
[91] N. Ohlsson and H. Alberg, “Predicting Error-Prone Software Modules in Telephone Switches,” IEEE Trans. Software Eng., vol. 22, no. 12, pp. 886–894, Dec. 1996.
[92] E. Pedhazur, Multiple Regression in Behavioral Research. Harcourt Brace Jova novich, 1982.
[93] D. Pergibon, “Logistic Regression Diagnostics,” The Annals of Statistics, vol. 9, no. 4, pp. 705-724, 1981.
[94] R.S. Pressman, Software Engineering: A Practitioner's Approach, fourth ed. New York: McGraw-Hill, 1997.
[95] K. Rothman and S. Greenland, Modern Epidemiology. Lippincott-Raven, 1998.
[96] P. Rousseeuw and A. Leory, Robust Regression and Outlier Detection. Wiley Series in Probability and Statistics, 1987.
[97] R. Schaefer, L. Roi, and R. Wolfe, “A Ridge Logistic Estimator,” Comm. Statistics—Theory and Methods, vol. 13, no. 1, pp. 99-113, 1984.
[98] R. Schaefer, “Alternative Estimators in Logistic Regression when the Data are Collinear,” The J. Statistical Computation and Simulation, vol. 25, pp. 75-91, 1986.
[99] J. Schlesselman, Case-Control Studies: Design, Conduct, Analysis. Oxford Univ. Press 1982.
[100] D. Schmidt and P. Stephenson, “Experiences Using Design Patterns to Evolve System Software Across Diverse OS Platforms,” Proc. Ninth European Conf. Object Oriented Programming, 1995.
[101] D. Schmidt, “A System of Reusable Design Patterns for Communication Software,” The Theory and Practice of Object Systems, S. Berzuk ed., 1995.
[102] D. Schmidt, “Using Design Patterns to Develop Reusable Object-Oriented Communication Software,” Comm. ACM, vol. 38, pp. 65-74, 1995.
[103] S. Shlaer and S. Mellor, Object-Oriented Systems: Modeling the World in Data, Yourdon Press, Englewood Cliffs, N.J., 1988.
[104] S. Simon and J. Lesage, “The Impact of Collinearity Involving the Intercept Term on the Numerical Accuracy of Regression,” Computer Science in Economics and Management, vol. 1, pp. 137-152, 1988.
[105] K. Smith, M. Slattery, and T. French, “Collinear Nutrients and the Risk of Colon Cancer,” J. Clinical Epidemiology, vol. 44, no. 7, pp. 715-723, 1991.
[106] M.H. Tang, M.H. Kao, and M.H. Chen, “An Empirical Study on Object Oriented Metrics,” Proc. Sixth Int'l Software Metrics Symp., pp. 242-249, 1999.
[107] B. Unger and L. Prechelt, “The Impact of Inheritance Depth on Maintenance Tasks—Detailed Description and Evaluation of Two Experiment Replications,” Technical Report 19/1998, Fakultat fur Informatik, Univ. Karlsruhe, 1998.
[108] Y. Wax, “Collinearity Diagnosis for Relative Risk Regression Analysis: An Application to Assessment of Diet-Cancer Relationship in Epidemiological Studies,” Statistics in Medicine, vol. 11, pp. 1273-1287, 1992.
[109] S. Wiedenbeck, V. Ramalingam, S. Sarasamma, and C. Corritore, “A Comparison of the Comprehension of Object-Oriented and Procedural Programs by Novice Programmers,” Interacting with Computers, vol. 11, no. 3, pp. 255-282, 1999.
[110] N. Wilde, P. Matthews, and R. Huitt, “Maintaining Object-Oriented Software,” IEEE Software, vol. 10, no. 1, pp. 75–80, Jan. 1993.

Index Terms:
Object-oriented metrics, software quality, metrics validation, validation methodology, object-oriented quality, coupling metrics, inheritance metrics, cohesion metrics.
Citation:
Khaled El Emam, Saïda Benlarbi, Nishith Goel, Shesh N. Rai, "The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics," IEEE Transactions on Software Engineering, vol. 27, no. 7, pp. 630-650, July 2001, doi:10.1109/32.935855
Usage of this product signifies your acceptance of the Terms of Use.