Issue No.04 - July/August (2010 vol.36)
pp: 546-558
Raymond P.L. Buse , University of Virginia, Charlottesville
In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.
Software readability, program understanding, machine learning, software maintenance, code metrics, FindBugs.
Raymond P.L. Buse, "Learning a Metric for Code Readability", IEEE Transactions on Software Engineering, vol.36, no. 4, pp. 546-558, July/August 2010, doi:10.1109/TSE.2009.70
[1] K. Aggarwal, Y. Singh, and J.K. Chhabra, "An Integrated Measure of Software Maintainability," Proc. Reliability and Maintainability Symp., pp. 235-241, Sept. 2002.
[2] S. Ambler, "Java Coding Standards," Software Development, vol. 5, no. 8, pp. 67-71, 1997.
[3] B.B. Bederson, B. Shneiderman, and M. Wattenberg, "Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies," ACM Trans. Graphics, vol. 21, no. 4, pp. 833-854, 2002.
[4] B. Boehm and V.R. Basili, "Software Defect Reduction Top 10 List," Computer, vol. 34, no. 1, pp. 135-137, Jan. 2001.
[5] R.P.L. Buse and W.R. Weimer, "A Metric for Software Readability," Proc. Int'l Symp. Software Testing and Analysis, pp. 121-130, 2008.
[6] L.W. Cannon, R.A. Elliott, L.W. Kirchhoff, J.H. Miller, J.M. Milner, R.W. Mitze, E.P. Schan, N.O. Whittington, H. Spencer, D. Keppel, and M. Brader, Recommended C Style and Coding Standards: Revision 6.0, Specialized Systems Consultants, June 1990.
[7] T.Y. Chen, F.-C. Kuo, and R. Merkel, "On the Statistical Properties of the F-Measure," Proc. Int'l Conf. Quality Software, pp. 146-153, 2004.
[8] L.E. Deimel,Jr., "The Uses of Program Reading," ACM SIGCSE Bull., vol. 17, no. 2, pp. 5-14, 1985.
[9] E.W. Dijkstra, A Discipline of Programming. Prentice Hall PTR, 1976.
[10] J.L. Elshoff and M. Marcotty, "Improving Computer Program Readability to Aid Modification," Comm. ACM, vol. 25, no. 8, pp. 512-521, 1982.
[11] R.F. Flesch, "A New Readability Yardstick," J. Applied Psychology, vol. 32, pp. 221-233, 1948.
[12] F.P. Brooks,Jr., "No Silver Bullet: Essence and Accidents of Software Engineering," Computer, vol. 20, no. 4, pp. 10-19, Apr. 1987.
[13] L.L. Giventer, Statistical Analysis in Public Administration. Jones and Bartlett, 2007.
[14] J. Gosling, B. Joy, and G.L. Steele, The Java Language Specification. Addison-Wesley, 1996.
[15] R. Gunning, The Technique of Clear Writing. McGraw-Hill, 1952.
[16] N.J. Haneef, "Software Documentation and Readability: A Proposed Process Improvement," ACM SIGSOFT Software Eng. Notes, vol. 23, no. 3, pp. 75-77, 1998.
[17] A.E. Hatzimanikatis, C.T. Tsalidis, and D. Christodoulakis, "Measuring the Readability and Maintainability of Hyperdocuments," J. Software Maintenance, vol. 7, no. 2, pp. 77-90, 1995.
[18] G. Holmes, A. Donkin, and I. Witten, "WEKA: A Machine Learning Workbench," Proc. Australia and New Zealand Conf. Intelligent Information Systems, 1994.
[19] D. Hovemeyer and W. Pugh, "Finding Bugs Is Easy," ACM SIGPLAN Notices, vol. 39, no. 12, pp. 92-106, 2004.
[20], "jUnit 4.0 Now Available," forumforum.php?forum_id=541181 , Feb. 2006.
[21] J.P. Kinciad and E.A. Smith, "Derivation and Validation of the Automated Readability Index for Use with Technical Materials," Human Factors, vol. 12, pp. 457-464, 1970.
[22] J.C. Knight and E.A. Myers, "Phased Inspections and Their Implementation," ACM SIGSOFT Software Eng. Notes, vol. 16, no. 3, pp. 29-35, 1991.
[23] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proc. Int'l Joint Conf. Artificial Intelligence, vol. 14, no. 2, pp. 1137-1145, 1995.
[24] C. Le Goues and W. Weimer, "Specification Mining with Few False Positives," Proc. 15th Int'l Conf. Tools and Algorithms for the Construction and Analysis of Systems, 2009.
[25] R. Likert, "A Technique for the Measurement of Attitudes," Archives of Psychology, vol. 140, pp. 44-53, 1932.
[26] S. MacHaffie, R. McLeod, B. Roberts, P. Todd, and L. Anderson, "A Readability Metric for Computer-Generated Mathematics," technical report, Saltire Software, equation.html, 2007.
[27] T.J. McCabe, "A Complexity Measure," IEEE Trans. Software Eng., vol. 2, no. 4, pp. 308-320, Dec. 1976.
[28] G.H. McLaughlin, "Smog Grading—A New Readability," J. Reading, vol. 12, no. 8, pp. 639-646, May 1969.
[29] R.J. Miara, J.A. Musselman, J.A. Navarro, and B. Shneiderman, "Program Indentation and Comprehensibility," Comm. ACM, vol. 26, no. 11, pp. 861-867, 1983.
[30] T. Mitchell, Machine Learning. McGraw Hill, 1997.
[31] N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. 27th Int'l Conf. Software Eng., pp. 284-292, 2005.
[32] C.V. Ramamoorthy and W.-T. Tsai, "Advances in Software Engineering," Computer, vol. 29, no. 10, pp. 47-58, Oct. 1996.
[33] D.R. Raymond, "Reading Source Code," Proc. Conf. Center for Advanced Studies on Collaborative Research, pp. 3-16, 1991.
[34] P.A. Relf, "Tool Assisted Identifier Naming for Improved Software Readability: An Empirical Study," Proc. Int'l Symp. Empirical Software Eng., Nov. 2005.
[35] S. Rugaber, "The Use of Domain Knowledge in Program Understanding," Ann. Software Eng., vol. 9, nos. 1-4, pp. 143-192, 2000.
[36] C. Simonyi, "Hungarian Notation," MSDN Library, Nov. 1999.
[37] S.E. Stemler, "A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability," Practical Assessment, Research and Evaluation, vol. 9, no. 4, 2004.
[38] H. Sutter and A. Alexandrescu, C++ Coding Standards: 101 Rules, Guidelines, and Best Practices. Addison-Wesley Professional, 2004.
[39] T. Tenny, "Program Readability: Procedures versus Comments," IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1271-1279, Sept. 1988.
[40] A. Watters, G. van Rossum, and J.C. Ahlstrom, Internet Programming with Python. MIS Press/Henry Holt, 1996.
[41] E.J. Weyuker, "Evaluating Software Complexity Measures," IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1357-1365, 1988.