The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2010 vol.36)
pp: 852-864
Yi (Cathy) Liu , Georgia College & State University, Milledgeville, GA
Taghi M. Khoshgoftaar , Florida Atlantic University, Boca Raton, FL
Naeem Seliya , University of Michigan-Dearborn, Dearborn, MI
ABSTRACT
A novel search-based approach to software quality modeling with multiple software project repositories is presented. Training a software quality model with only one software measurement and defect data set may not effectively encapsulate quality trends of the development organization. The inclusion of additional software projects during the training process can provide a cross-project perspective on software quality modeling and prediction. The genetic-programming-based approach includes three strategies for modeling with multiple software projects: Baseline Classifier, Validation Classifier, and Validation-and-Voting Classifier. The latter is shown to provide better generalization and more robust software quality models. This is based on a case study of software metrics and defect data from seven real-world systems. A second case study considers 17 different (nonevolutionary) machine learners for modeling with multiple software data sets. Both case studies use a similar majority-voting approach for predicting fault-proneness class of program modules. It is shown that the total cost of misclassification of the search-based software quality models is consistently lower than those of the non-search-based models. This study provides clear guidance to practitioners interested in exploiting their organization's software measurement data repositories for improved software quality modeling.
INDEX TERMS
Genetic programming, optimization, software quality, defects, machine learning, software measurement.
CITATION
Yi (Cathy) Liu, Taghi M. Khoshgoftaar, Naeem Seliya, "Evolutionary Optimization of Software Quality Modeling with Multiple Repositories", IEEE Transactions on Software Engineering, vol.36, no. 6, pp. 852-864, November/December 2010, doi:10.1109/TSE.2010.51
REFERENCES
[1] N.F. Schneidewind, "Body of Knowledge for Software Quality Measurement," Computer, vol. 35, no. 2, pp. 77-83, Feb. 2002.
[2] L.C. Briand, W.L. Melo, and J. Wust, "Assessing the Applicability of Fault-Proneness Models across Object-Oriented Software Projects," IEEE Trans. Software Eng., vol. 28, no. 7, pp. 706-720, July 2002.
[3] N.J. Pizzi, R. Summers, and W. Pedrycz, "Software Quality Prediction Using Median-Adjusted Class Labels," Proc. IEEE CS Int'l Joint Conf. Neural Networks, vol. 3., pp. 2405-2409, May 2002.
[4] A. Koru and H. Liu, "Building Effective Defect-Prediction Models in Practice," IEEE Software, vol. 22, no. 6, pp. 23-29, Nov./Dec. 2005.
[5] N.F. Schneidewind, "Investigation of Logistic Regression as a Discriminant of Software Quality," Proc. IEEE CS Seventh Int'l Software Metrics Symp., pp. 328-337, Apr. 2001.
[6] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[7] L. Guo, B. Cukic, and H. Singh, "Predicting Fault Prone Modules by the Dempster-Shafer Belief Networks," Proc. IEEE CS 18th Int'l Conf. Automated Software Eng., pp. 249-252, Oct. 2003.
[8] T.M. Khoshgoftaar and N. Seliya, "Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study," Empirical Software Eng. J., vol. 9, no. 3, pp. 229-257, 2004.
[9] N.E. Fenton and S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, second ed. PWS Publishing, 1997.
[10] M. Harman, "The Current State and Future of Search Based Software Engineering," Proc. IEEE CS Workshop Future of Software Eng., pp. 342-357, May 2007.
[11] M. Harman and B. Jones, "Search Based Software Engineering," J. Information and Software Technology, vol. 43, no. 14, pp. 833-839, 2001.
[12] J.R. Koza, Genetic Programming, vol. 1. MIT Press, 1992.
[13] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[14] I. Kushchu, "Genetic Programming and Evolutionary Generalization," IEEE Trans. Evolutionary Computation, vol. 6, no. 5, pp. 431-442, Oct. 2002.
[15] C. Gagné, M. Schoenauer, M. Parizeau, and M. Tomassini, "Genetic Programming, Validation Sets, and Parsimony Pressure," Proc. Ninth European Conf. Genetic Programming, P. Collet, M. Tomassini, M. Ebner, S. Gustafson, and A. Ekárt, eds., pp. 109-120, Springer, Apr. 2006.
[16] T.M. Khoshgoftaar, P. Rebours, and N. Seliya, "Software Quality Analysis by Combining Multiple Projects and Learners," Software Quality J., vol. 17, no. 1, pp. 25-49, Mar. 2009.
[17] M.J. Meulen and M.A. Revilla, "Correlations between Internal Software Metrics and Software Dependability in a Large Population of Small C/C++ Programs" Proc. 18th IEEE Int'l Symp. Software Reliability Eng., pp. 203-208, Nov. 2007.
[18] C. Kaner and W.P. Bond, "Software Engineering Metrics: What Do They Measure and How Do We Know," Proc. 10th IEEE Int'l Software Metrics Symp., Sept. 2004.
[19] T.M. Khoshgoftaar and Y. Liu, "A Multi-Objective Software Quality Classification Model Using Genetic Programming," IEEE Trans. Reliability, vol. 56, no. 2, pp. 237-245, June 2007.
[20] T.M. Khoshgoftaar, N. Seliya, and D.D. Drown, "On the Rarity of Fault-Prone Modules in Knowledge-Based Software Quality Modeling," Proc. 20th Int'l Conf. Software Eng. and Knowledge Eng., July 2008.
[21] A. Folleco, T.M. Khoshgoftaar, J. VanHulse, and L. Bullard, "Software Quality Modeling: The Impact of Class Noise on the Random Forest Classifier," Proc. IEEE World Congress on Computational Intelligence, June 2008.
[22] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008.
[23] T.M. Khoshgoftaar and N. Seliya, "The Necessity of Assuring Quality in Software Measurement Data," Proc. IEEE CS 10th Int'l Symp. Software Metrics, pp. 119-130, Sept. 2004.
[24] J. Sayyad Shirabad and T. Menzies, "The PROMISE Repository of Software Engineering Databases." School of Information Technology and Eng., Univ. of Ottawa, http://promise.site.uottawa.caSERepository , 2005.
[25] J.J. Cuadrado-Gallego, L. Fernández-Sanz, and M.-Á. Sicilia, "Enhancing Input Value Selection in Parametric Software Cost Estimation Models through Second Level Cost Drivers," Software Quality J., vol. 14, no. 4, pp. 330-357, Dec. 2006.
[26] M. Shepperd and G. Kadoda, "Comparing Software Prediction Techniques Using Simulation," IEEE Trans. Software Eng., vol. 27, no. 11, pp. 1014-1022, Nov. 2001.
[27] K. Sunghun, T. Zimmermann, E.J. Whitehead, and A. Zeller, "Predicting Faults from Cached History," Proc. 29th Int'l Conf. Software Eng., pp. 489-498, 2007.
[28] W. Banzhaf, P. Nordin, R.E. Keller, and F.D. Francone, Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Application. PWS Publishing Company, 1998.
[29] GP-Tool, http://garage.cse.msu.edu/softwarelil-gp /, 1998.
[30] H. Iba, H. de Garis, and T. Sato, "Genetic Programming Using Minimum Description Length Principle," Advances in Genetic Programming: Complex Adaptive Systems, pp. 265-284, MIT Press, 1994.
[31] B.T. Zhang and H. Muhlenbein, "Balancing Accuracy and Parsimony in Genetic Programming," Evolutionary Computation, vol. 3, no. 1, pp. 17-38, 1995.
[32] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[33] C.G. Atkeson, A.W. Moore, and S. Schaal, "Locally Weighted Learning," Artificial Intelligence Rev., vol. 11, nos. 1-5, pp. 11-73, 1997.
[34] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[35] J.C. Platt, "Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines," Technical Report 98-14, Microsoft Research, Apr. 1998.
[36] T.M. Khoshgoftaar and E.B. Allen, "Logistic Regression Modeling of Software Quality," Int'l J. Reliability, Quality, and Safety Eng., vol. 6, no. 4, pp. 303-317, 1999.
[37] B.R. Gaines and P. Compton, "Induction of Ripple-Down Rules Applied to Modeling Large Databases," J. Intelligent Information Systems, vol. 5, no. 3, pp. 211-228, 1995.
[38] R.C. Holte, "Very Simple Classification Rules Perform Well on Most Commonly Used Data Sets," Machine Learning, vol. 11, pp. 63-91, 1993.
[39] R. Kohavi, "The Power of Decision Tables," Proc. European Conf. Machine Learning, N. Lavrač and S. Wrobel, eds., pp. 174-189, 1995.
[40] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[41] E. Frank and I.H. Witten, "Generating Accurate Rule Sets without Global Optimization," Proc. 15th Int'l Conf. Machine Learning. pp. 144-151, 1998.
[42] T.M. Khoshgoftaar, X. Yuan, and E.B. Allen, "Balancing Misclassification Rates in Classification Tree Models of Software Quality," Empirical Software Eng., vol. 5, pp. 313-330, 2000.
[43] Y. Freund and L. Mason, "The Alternating Decision Tree Learning Algorithm," Proc. 16th Int'l Conf. Machine Learning, pp. 124-133, 1999.
[44] W.W. Cohen, "Fast Effective Rule Induction," Proc. 16th Int'l Conf. Machine Learning, A. Prieditis and S. Russell, eds., pp. 115-123, July 1995.
[45] E. Frank, L. Trigg, G. Holmes, and I.H. Witten, "Naive Bayes for Regression," Machine Learning, vol. 41, no. 1, pp. 5-25, 2000.
[46] A. Arcuri, P.K. Lehre, and X. Yao, "Theoretical Runtime Analyses of Search Algorithms on the Test Data Generation for the Triangle Classification Problem," Proc. IEEE CS First Int'l Workshop Search-Based Software Testing in Conjunction with ICST '08, pp. 161-169, Apr. 2008.
[47] A. Arcuri, P.K. Lehre, and X. Yao, "Theoretical Runtime Analysis in Search Based Software Engineering," Technical Report CSR-09-04, Univ. of Birmingham, ftp://ftp.cs.bham.ac. uk/pub/tech-reports/ 2009CSR-09-04.pdf, 2009.
[48] C. Wohlin, P. Runeson, M. Host, M.C. Ohlsson, B. Regnell, and A. Wesslen, Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, 2000.
8 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool