The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2011 vol.37)
pp: 772-787
Yonghee Shin , DePaul University, Chicago
Andrew Meneely , North Carolina State University, Raleigh
Laurie Williams , North Carolina State University, Raleigh
Jason A. Osborne , North Carolina State University, Raleigh
ABSTRACT
Security inspection and testing require experts in security who think like an attacker. Security experts need to know code locations on which to focus their testing and inspection efforts. Since vulnerabilities are rare occurrences, locating vulnerable code locations can be a challenging task. We investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations. If so, security experts can use this prediction to prioritize security inspection and testing efforts. The metrics we investigated fall into three categories: complexity, code churn, and developer activity metrics. We performed two empirical case studies on large, widely used open-source projects: the Mozilla Firefox web browser and the Red Hat Enterprise Linux kernel. The results indicate that 24 of the 28 metrics collected are discriminative of vulnerabilities for both projects. The models using all three types of metrics together predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects. Compared to a random selection of files for inspection and testing, these models would have reduced the number of files and the number of lines of code to inspect or test by over 71 and 28 percent, respectively, for both projects.
INDEX TERMS
Fault prediction, software metrics, software security, vulnerability prediction.
CITATION
Yonghee Shin, Andrew Meneely, Laurie Williams, Jason A. Osborne, "Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities", IEEE Transactions on Software Engineering, vol.37, no. 6, pp. 772-787, November/December 2011, doi:10.1109/TSE.2010.81
REFERENCES
[1] I.V. Krsul, "Software Vulnerability Analysis," PhD dissertation, Purdue Univ., 1998.
[2] B. Cashell, W.D. Jackson, M. Jickling, and B. Web, "CRS Report for Congress: The Economic Impact of Cyber-Attacks," Congressional Research Service, Apr. 2004.
[3] G. McGraw, Software Security: Building Security In. Addison-Wesley, 2006.
[4] N. Fenton, M. Neil, W. Marsh, P. Hearty, D. Marquez, P. Krause, and R. Mishra, "Predicting Software Defects in Varying Development Lifecycles Using Bayesian Nets," Information and Software Technology, vol. 49, no. 1, pp. 32-43, 2007.
[5] V.R. Basili, L.C. Briand, and W.L. Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators," IEEE Trans. Software Eng., vol. 22, no. 10, pp. 751-761, Oct. 1996.
[6] L.C. Briand, J. Wüst, J.W. Daly, and D.V. Porter, "Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems," J. Systems and Software, vol. 51, no. 3, pp. 245-273, 2000.
[7] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[8] N. Nagappan and T. Ball, "Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study," Proc. First Int'l Symp. Empirical Software Eng. and Measurement, pp. 364-373, Sept. 2007.
[9] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Trans. Software Eng., vol. 31, no. 4, pp. 340-355, Apr. 2005.
[10] T.M. Khoshgoftaar, E.B. Allen, K.S. Kalaichelvan, and N. Goel, "Early Quality Prediction: A Case Study in Telecommunications," IEEE Software, vol. 13, no. 1, pp. 65-71, Jan. 1996.
[11] IEEE, "IEEE Standard for a Software Quality Metrics Methodology," IEEE Std 1061-1998 (R2004), IEEE CS, 2005.
[12] U. Brandes and T. Erlebach, Network Analysis: Methodological Foundations. Springer, 2005.
[13] M. Girvan and M.E.J. Newman, "Community Structure in Social and Biological Networks," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 12, pp. 7821-7826, 2002.
[14] T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald, "Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"," IEEE Trans. Software Eng., vol. 33, no. 9, pp. 637-640, Sept. 2007.
[15] N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. 27th Int'l Conf. Software Eng., pp. 284-292, May 2005.
[16] A. Meneely, L. Williams, W. Snipes, and J. Osborne, "Predicting Failures with Developer Networks and Social Network Analysis," Proc. 16th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 13-23, Nov. 2008.
[17] B. Schneier, Beyond Fear: Thinking Sensibly about Security in an Uncertain World. Springer-Verlag, 2003.
[18] T.J. McCabe, "A Complexity Measure," IEEE Trans. Software Eng., vol. 2, no. 4, pp. 308-320, Dec. 1976.
[19] T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Trans. Software Eng., vol. 26, no. 7, pp. 653-661, July 2000.
[20] A.H. Watson and T.J. McCabe, Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric, vol. 500, no. 235,Nat'l Inst. of Standards and Tech nology, Sept. 1996.
[21] M. Pinzger, N. Nagappan, and B. Murphy, "Can Developer-Module Networks Predict Failures?" Proc. Int'l Symp. Foundations in Software Eng., pp. 2-12, Nov. 2008.
[22] M.W. Fagerland and L. Sandvik, "Performance of Five Two-Sample Location Tests for Skewed Distributions with Unequal Variances," Contemporary Clinical Trials, vol. 30, pp. 490-496, 2009.
[23] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, "Implications of Ceiling Effects in Defect Predictors," Proc. Fourth Int'l Workshop Predictor Models in Software Eng., pp. 47-54, May 2008.
[24] E. Arisholm and L.C. Briand, "Predicting Fault-Prone Components in a Java Legacy System," Proc. ACM/IEEE Int'l Symp. Empirical Software Eng., pp. 8-17, Sept. 2006.
[25] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, "Automating Algorithms for the Identification of Fault-Prone Files," Proc. Int'l Symp. Software Testing and Analysis, pp. 219-227, July 2007.
[26] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, 2005.
[27] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[28] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008.
[29] Y. Kamei, A. Monden, S. Matsumoto, T. Kakimoto, and K. Matsumoto, "The Effects of Over and Under Sampling on Fault-Prone Module Detection," Proc. First Int'l Symp. Empirical Software Eng. and Measurement, pp. 196-204, Sept. 2007.
[30] S. Neuhaus, T. Zimmermann, and A. Zeller, "Predicting Vulnerable Software Components," Proc. 14th ACM Conf. Computer and Comm. Security, pp. 529-540, Oct./Nov. 2007.
[31] J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[32] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[33] E.J. Weyuker, T.J. Ostrand, and R.M. Bell, "Do Too Many Cooks Spoil the Broth? Using the Number of Developers to Enhance Defect Prediction Models," Empirical Software Eng., vol. 13, no. 5, pp. 539-559, 2008.
[34] Y. Shin and L. Williams, "Can Fault Prediction Models and Metrics Be Used for Vulnerability Prediction?" Technical Report-2010-6, North Carolina State Univ., Mar. 2010.
[35] M. Gegick, L. Williams, J. Osborne, and M. Vouk, "Prioritizing Software Security Fortification through Code-Level Metrics," Proc. Fourth ACM Workshop Quality of Protection, pp. 31-38, Oct. 2008.
[36] Y. Shin and L. Williams, "Is Complexity Really the Enemy of Software Security?" Proc. Fourth ACM Workshop Quality of Protection, pp. 47-50, Oct. 2008.
[37] Y. Shin and L. Williams, "An Empirical Model to Predict Security Vulnerabilities Using Code Complexity Metrics," Proc. Int'l Symp. Empirical Software Eng. and Measurement, pp. 315-317, 2008.
[38] J. Walden, M. Doyle, G.A. Welch, and M. Whelan, "Security of Open Source Web Applications," Proc. Int'l Workshop Security Measurements and Metrics, Oct. 2009.
[39] N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. 28th Int'l Conf. Software Eng., pp. 452-461, May 2006.
[40] N. Nagappan, T. Ball, and B. Murphy, "Using Historical In-Process and Product Metrics for Early Estimation of Software Failures," Proc. 17th Int'l Symp. Software Reliability Eng., pp. 62-74, Nov. 2006.
[41] J.M. Gonzales-Barahona, L. Lopez-Fernandez, and G. Robles, "Applying Social Network Analysis to the Information in CVS Repositories," Proc. Int'l Workshop Mining Software Repositories, May 2004.
[42] J.P. Hudepohl, W. Jones, and B. Lague, "EMERALD: A Case Study in Enhancing Software Reliability," Proc. Int'l Symp. Software Reliability Eng., pp. 85-91, Nov. 1997.
[43] N. Nagappan, B. Murphy, and V.R. Basili, "The Influence of Organizational Structure on Software Quality: An Empirical Case Study," Proc. Int'l Conf. Software Eng., pp. 521-530, May 2008.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool