Subscribe

Issue No.11 - November (2011 vol.23)

pp: 1601-1618

Gustavo E.A.P.A. Batista , Universidade de São Paulo (USP), São Carlos

Ronaldo C. Prati , Universidade Federal do ABC (UFABC), Santo André

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.59

ABSTRACT

Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.

INDEX TERMS

Machine learning, data mining, performance evaluation, ROC curves, cost curves, lift graphs.

CITATION

Gustavo E.A.P.A. Batista, Ronaldo C. Prati, "A Survey on Graphical Methods for Classification Predictive Performance Evaluation",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 11, pp. 1601-1618, November 2011, doi:10.1109/TKDE.2011.59REFERENCES

- [1] C. Schaffer, "A Conservation Law for Generalization Performance,"
Proc. 11th Int'l Conf. Machine Learning (ICML '94), pp. 259-265, 1994.- [2] D.H. Wolpert, "The Lack of a Priori Distinctions between Learning Algorithms,"
Neural Computation, vol. 8, pp. 1341-1390, 1996.- [3] P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta,
Metalearning: Applications to Data Mining. Springer, 2009.- [4] A.M. Martínez and M. Zhu, "Where are Linear Feature Extraction Methods Applicable?"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1934-1944, Dec. 2005.- [5] M. Zhu and A.M. Martínez, "Subclass Discriminant Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274-1286, Aug. 2006.- [6] F.J. Provost, T. Fawcett, and R. Kohavi, "The Case against Accuracy Estimation for Comparing Induction Algorithms,"
Proc. 15th Int'l Conf. Machine Learning (ICML '98), pp. 445-453, 1998.- [7] J.C. Xue and G.M. Weiss, "Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution,"
Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '09). pp. 897-906, 2009.- [8] D.J. Hand, "Measuring Classifier Performance: A Coherent Alternative to the Area under the Roc Curve,"
Machine Learning, vol. 77, no. 1, pp. 103-123, 2009.- [9] P. Datta, "Business Focused Evaluation Methods: A Case Study,"
Proc. Third European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '99), pp. 316-322, 1999.- [10] C. Drummond, W. Elazmeh, N. Japkowicz, and P. Cochairs, "2006 AAAI Workshop Evaluation Methods for Machine Learning," Technical Report WS-06-06, AAAI press, 2006.
- [11] C. Drummond, W. Elazmeh, N. Japkowicz, and S.A. Macskassy, "2007 AAAI Workshop Evaluation Methods for Machine Learning II," Technical Report WS-07-05, AAAI Press, 2007.
- [12] W. Klement, C. Drummond, N. Japkowicz, and S. Macskassy, "The Third Workshop Evaluation Methods for Machine Learning," http://www.site.uottawa.ca/ICML09WSindex.html , 2008.
- [13] W. Klement, C. Drummond, N. Japkowicz, and S. Macskassy, "The Fourth Workshop Evaluation Methods for Machine Learning,"
Proc. 26th Ann. Int'l Conf. Machine Learning (ICML '09), http://www.site.uottawa.ca/ICML09WSindex.html , 2009.- [14] C. Drummond, "Machine Learning as an Experimental Science (Revisited),"
Proc. AAAI Workshop Evaluation Methods for Machine Learning (Technical Report WS-06-06), 2006.- [15] C. Drummond and N. Japkowicz, "Warning: Statistical Benchmarking is Addictive. Kicking the Habit in Machine Learning,"
J. Experimental and Theoretical Artificial Intelligence, vol. 22, no. 1, pp. 67-80, 2009.- [16] J. Davis and M. Goadrich, "The Relationship between Precision-Recall and ROC Curves,"
Proc. 23rd Int'l Conf. Machine Learning (ICML '06), pp. 233-240, 2006.- [17] T. Fawcett, "An Introduction to ROC Analysis,"
Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006.- [18] M.C. Monard and G.E.A.P.A. Batista, "Graphical Methods for Classifier Performance Evaluation,"
Proc. Advances in Logic, Artificial Intelligence and Robotics (LAPTEC '2003), pp. 59-67, 2003.- [19] C. Drummond and R.C. Holte, "Cost Curves: An Improved Method for Visualizing Classifier Performance,"
Machine Learning, vol. 65, no. 1, pp. 95-130, 2006.- [20] L. Torgo and J. Gama, "Regression Using Classification Algorithms,"
Intelligent Data Analysis, vol. 1, nos. 1-4, pp. 275-292, 1997.- [21] S. Rosset, C. Perlich, and B. Zadrozny, "Ranking-Based Evaluation of Regression Models,"
Knowledge and Information Systems, vol. 12, no. 3, pp. 331-353, 2007.- [22] K.-A. Toh and H.-L. Eng, "Between Classification-Error Approximation and Weighted Least-Squares Learning,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 658-669, Apr. 2008.- [23] L. Breiman, J. Friedman, R. Olshen, and C. Stone,
Classification and Regression Trees. Wadsworth Int'l Group, 1984.- [24] C. Elkan, "The Foundations of Cost-Sensitive Learning,"
Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI '01), pp. 973-978, 2001.- [25] P. Domingos, "Metacost: A General Method for Making Classifiers Cost-Sensitive,"
Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 155-164, 1999.- [26] A.P. Bradley, "The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms,"
Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.- [27] N.M. Adams and D.J. Hand, "Comparing Classifiers When the Misallocation Costs are Uncertain,"
Pattern Recognition, vol. 32, no. 7, pp. 1139-1147, 1999.- [28] C.X. Ling and C. Li, "Data Mining for Direct Marketing: Problems and Solutions,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD '98), pp. 73-79, 1998.- [29] G.W. Brier, "Verification of Forecasts Expressed in Terms of Probability,"
Monthly Weather Rev., vol. 78, no. 1, pp. 1-3, 1950.- [30] A.H. Murphy, "A New Vector Partition of the Probability Forecasts,"
J. Applied Meteorology, vol. 12, no. 4, pp. 595-560, 1976.- [31] I. Cohen and M. Goldszmidt, "Properties and Benefits of Calibrated Classifiers,"
Proc. Eighth European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '04), pp. 125-136, 2004.- [32] W. Hsu and A.H. Murphy, "The Attributes Diagram: A Geometrical Framework for Assessing the Quality of Probability Forecasts,"
Int'l J. Forecasting, vol. 2, no. 3, pp. 285-293, 1986.- [33] D.S. Wilks,
Statistical Methods in the Atmospheric Sciences, second ed. Elsevier, 2006.- [34] T.M. Hamill, "Reliability Diagrams for Multicategory Probabilistic Forecasts,"
Weather and Forecasting, vol. 12, no. 4, pp. 736-741, 1996.- [35] D. Mossman, "Three-Way ROCs,"
Medical Decision Making, vol. 19, no. 1, pp. 78-89, 1999.- [36] T.C. Landgrebe and R.P. Duin, "Efficient Multiclass ROC: Approximation by Decomposition via Confusion Matrix Perturbation Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 5, pp. 810-822, May 2008.- [37] P. van der Putten and M. van Someren, Eds.,
CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, http://www.liacs.nl/~putten/librarycc2000 /, 2000.- [38] I.H. Witten and E. Frank,
Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.- [39] R.R. Bouckaert, "Bayesian Networks in Weka," Technical Report 14/2004, Computer Science Dept., Univ. of Waikato, 2004.
- [40] R. Kohavi, "Scaling up the Accuracy of Naïve Bayes Classifiers: A Decision-Tree Hybrid,"
Proc. Second Int'l Conf. Knowledge Discovery and Data Mining (KDD '96), pp. 202-207, 1996. |