This Article 
 Bibliographic References 
 Add to: 
Using AUC and Accuracy in Evaluating Learning Algorithms
March 2005 (vol. 17 no. 3)
pp. 299-310
The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. In this paper, we establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.

[1] C. Blake and C. Merz UCI Repository of Machine Learning Databases, Univ. of California Irvine, , 1998.
[2] B. Boser, I. Guyon, and V. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Conf. Computational Learning Theory, pp. 144-152, 1992.
[3] A.P. Bradley, “The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms,” Pattern Recognition, vol. 30, pp. 1145-1159, 1997.
[4] C.J. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[5] C.C. Chang and C. Lin, “LIBSVM: A Library for Support Vector Machines,” version 2.4, 2003.
[6] W.W. Cohen, R.E. Schapire, and Y. Singer, “Learning to Order Things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, 1999.
[7] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge Univ. Press, 2000.
[8] P. Domingos and M. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,” Proc. 13th Int'l Conf. Machine Learning, pp. 105-112, 1996.
[9] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley-Interscience, 1973.
[10] J. Egan, Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975.
[11] U. Fayyad and K. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[12] C. Ferri, P.A. Flach, and J. Hernandez-Orallo, “Learning Decision Trees Using the Area under the ROC Curve,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 139-146, 2002.
[13] D. Green and J. Swets, Signal Detection Theory and Psychophysics. New York: Wiley, 1966.
[14] D.J. Hand and R.J. Till, “A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems,” Machine Learning, vol. 45, pp. 171-186, 2001.
[15] J.A. Hanley and B.J. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristics (ROC) Curve,” Radiology, vol. 143, pp. 29-36, 1982.
[16] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[17] C. Hsu and C. Lin, “A Comparison on Methods for Multi-Class Support Vector Machines,” technical report, Dept. of Computer Science and Information Eng., National Taiwan Univ., Taipei, Taiwan, 2001.
[18] I. Kononenko, “Comparison of Inductive and Naive Bayesian Learning Approaches to Automatic Knowledge Acquisition,” Current Trends in Knowledge Acquisition, B. Wielinga, ed., 1990.
[19] P. Langley, W. Iba, and K. Thomas, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223-228, 1992.
[20] C. Ling and C. Li, “Data Mining for Direct Marketing— Specific Problems and Solutions,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD '98), pp. 73-79, 1998.
[21] C.X. Ling, J. Huang, and H. Zhang, “AUC: A Statistically Consistent and More Discriminating Measure than Accuracy,” Proc. 18th Int'l Conf. Artificial Intelligence (IJCAI '03), pp. 329-341, 2003.
[22] C.X. Ling and H. Zhang, “Toward Bayesian Classifiers with Accurate Probabilities,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 123-134, 2002.
[23] H. Liu, F. Hussain, C.L. Tan, and M. Dash, “Discretization: An Enabling Technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.
[24] C. Metz, “Basic Principles of ROC Analysis,” Seminars in Nuclear Medcine, vol. 8, pp. 283-298, 1978.
[25] D. Meyer, F. Leisch, and K. Hornik, “Benchmarking Support Vector Machines,” technical report, Vienna Univ. of Economics and Business Administration, 2002.
[26] F. Provost and P. Domingos, “Well-Trained PETs: Improving Probability Estimation Trees,” Technical Report CDER #0004-IS, Stern School of Business, New York Univ., http://www.stern. nyu.edufprovost, 2000.
[27] F. Provost and P. Domingos, “Tree Induction for Probability-Based Ranking,” Machine Learning, vol. 52, no. 3, pp. 199-215, 2003.
[28] F. Provost and T. Fawcett, “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.
[29] F. Provost, T. Fawcett, and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” Proc. 15th Int'l Conf. Machine Learning, pp. 445-453, 1998.
[30] J. Quinlan, C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, 1993.
[31] B. Scholkopf and A. Smola, Learning with Kernels. MIT Press, 2002.
[32] P. Smyth, A. Gray, and U. Fayyad, “Retrofitting Decision Tree Classifiers using Kernel Density Estimation,” Proc. 12th Int'l Conf. Machine Learning, pp. 506-514, 1995.
[33] K. Spackman, “Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning,” Proc. Sixth Int'l Workshop Machine Learning, pp. 160-163, 1989.
[34] J.A.K. Suykens and J. Vandewalle, “Multiclass Least Squares Support Vector Machines,” Proc. Int'l Joint Conf. Neural Networks (IJCNN '99), 1999.
[35] J. Swets, “Measuring the Accuracy of Diagnostic Systems,” Science, vol. 240, pp. 1285-1293, 1988.
[36] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

Index Terms:
Evaluation of learning algorithms, ROC, AUC of ROC, accuracy.
Jin Huang, Charles X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 299-310, March 2005, doi:10.1109/TKDE.2005.50
Usage of this product signifies your acceptance of the Terms of Use.