Subscribe

Issue No.10 - Oct. (2013 vol.25)

pp: 2356-2366

H. Altay Guvenir , Bilkent University, Ankara

Murat Kurtcephe , Case Western Reserve University, Cleveland

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.214

ABSTRACT

In recent years, the problem of learning a real-valued function that induces a ranking over an instance space has gained importance in machine learning literature. Here, we propose a supervised algorithm that learns a ranking function, called ranking instances by maximizing the area under the ROC curve (RIMARC). Since the area under the ROC curve (AUC) is a widely accepted performance measure for evaluating the quality of ranking, the algorithm aims to maximize the AUC value directly. For a single categorical feature, we show the necessary and sufficient condition that any ranking function must satisfy to achieve the maximum AUC. We also sketch a method to discretize a continuous feature in a way to reach the maximum AUC as well. RIMARC uses a heuristic to extend this maximization to all features of a data set. The ranking function learned by the RIMARC algorithm is in a human-readable form; therefore, it provides valuable information to domain experts for decision making. Performance of RIMARC is evaluated on many real-life data sets by using different state-of-the-art algorithms. Evaluations of the AUC metric show that RIMARC achieves significantly better performance compared to other similar methods.

INDEX TERMS

Training, Nickel, Algorithm design and analysis, Machine learning algorithms, Machine learning, Measurement, Training data, machine learning, Training, Nickel, Algorithm design and analysis, Machine learning algorithms, Machine learning, Measurement, Training data, decision support, Ranking, data mining

CITATION

H. Altay Guvenir, Murat Kurtcephe, "Ranking Instances by Maximizing the Area under ROC Curve",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 10, pp. 2356-2366, Oct. 2013, doi:10.1109/TKDE.2012.214REFERENCES

- [1] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth, "Generalization Bounds for the Area under the ROC Curve,"
J. Machine Learning Research, vol. 6, pp. 393-425, 2005.- [2] S. Agarwal and D. Roth, "Learnability of Bipartite Ranking Functions,"
Proc. 18th Ann. Conf. Learning Theory, 2005.- [3] K. Ataman, W.N. Street, and Y. Zhang, "Learning to Rank by Maximizing AUC with Linear Programming,"
Proc. IEEE Int'l Joint Conf. Neural Networks (IJCNN), pp. 123-129, 2006.- [4] H. Boström, "Maximizing the Area under the ROC Curve Using Incremental Reduced Error Pruning,"
Proc. Int'l Conf. Machine Learning Workshop (ICML '05), 2005.- [5] A.P. Bradley, "The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms,"
Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.- [6] B.O. Bradley and M.S. Taqqu, "Handbook of Heavy-Tailed Distributions in Finance,"
Financial Risk and Heavy Tails, S.T. Rachev, ed., pp. 35-103, Elsevier, 2003.- [7] U. Brefeld and T. Scheffer, "AUC Maximizing Support Vector Learning,"
Proc. ICML Workshop ROC Analysis in Machine Learning, 2005.- [8] T. Calders and S. Jaroszewicz, "Efficient AUC Optimization for Classification,"
Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '07), pp. 42-53, 2007.- [9] C.C. Chang and C.C. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
- [10] S. Cleménçon, G. Lugosi, and N. Vayatis, "Ranking and Scoring Using Empirical Risk Minimization,"
Proc. 18th Ann. Conf. Learning Theory (COLT '05), pp. 1-15, 2005.- [11] W.W. Cohen, R.E. Schapire, and Y. Singer, "Learning to Order Things,"
J. Artificial Intelligence Research, vol. 10, pp. 243-270, 1998.- [12] R.M. Conroy, K. Pyörälä, and A.P. Fitzgerald, "Estimation of Ten-Year Risk of Fatal Cardiovascular Disease in Europe: The SCORE Project,"
European Heart J., vol. 11, pp. 987-1003, 2003.- [13] C. Cortes and M. Mohri, "AUC Optimization versus Error Rate Minimization,"
Proc. Conf. Neural Information Processing Systems (NIPS '03), vol. 16, pp. 313-320, 2003.- [14] R.B. D'Agostino, S.V. Ramachandran, and J. Pencina, "General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study,"
Circulation, vol. 17, pp. 743-753, 2008.- [15] P. Domingos, "MetaCost: A General Method for Making Classifiers Cost-Sensitive,"
Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.- [16] J. Dougherty, R. Kohavi, and M. Sahami, "Supervised and Unsupervised Discretization of Continuous Features,"
Proc. 12th Int'l Conf. Machine Learning, pp. 194-202, 1995.- [17] K. Dowd and D. Blake, "After VaR: The Theory, Estimation, and Insurance Applications of Quantile-Based Risk Measures,"
The J. Risk and Insurance, vol. 73, no. 2, pp. 193-229, 2006.- [18] W. Fan, M.D. Gordon, and P. Pathak, "Discovery of Context-Specific Ranking Functions for Effective Information Retrieval Using Genetic Programming,"
IEEE Trans. Knowledge and Data Eng., vol. 16, no. 4, pp. 523-27, Apr. 2004.- [19] U. Fayyad and K. Irani, "On the Handling of Continuous-Valued Attributes in Decision Tree Generation,"
Machine Learning, vol. 8, pp. 87-102, 1992.- [20] T. Fawcett and F. Provost, "Adaptive Fraud Detection,"
Data Mining and Knowledge Discovery, vol. 1, pp. 291-316, 1997.- [21] T. Fawcett, "Using Rule Sets to Maximize ROC Performance,"
Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 131-138, 2001.- [22] T. Fawcett, "An Introduction to ROC Analysis,"
Pattern Recognition Letters, vol. 27, pp. 861-874, 2006.- [23] C. Ferri, P. Flach, and J. Hernandez, "Learning Decision Trees Using the Area under the ROC Curve,"
Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 139-146, 2002.- [24] P. Flach and S. Wu, "Repairing Concavities in ROC Curves,"
Proc. UK Workshop Computational Intelligence, pp. 38-44, 2003.- [25] A. Frank and A. Asuncion, "UCI Machine Learning Repository," School of Information and Computer Science, Univ. of California, http://archive.ics.uci.eduml, 2010.
- [26] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer, "An Efficient Boosting Algorithm for Combining Preferences,"
J. Machine Learning Research, vol. 4, pp. 933-969, 2003.- [27] H.A. Güvenir and İ. Şirin, "Classification by Feature Partitioning,"
Machine Learning, vol. 23, no. 1, pp. 47-67, 1996.- [28] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, "The WEKA Data Mining Software: An Update,"
SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.- [29] G. Han and C. Zhao, "AUC Maximization Linear Classifier Based on Active Learning and Its Application,"
Neurocomputing, vol. 73, nos. 7-9, pp. 1272-1280, 2010.- [30] J.A. Hanley and B.J. McNeil, "The Meaning and use of the Area under a Receiver Operating Characteristic (ROC) Curve,"
Radiology, vol. 143, pp. 29-36, 1982.- [31] A. Herschtal and B. Raskutti, "Optimising the Area under the ROC Curve Using Gradient Descent,"
Proc. Int'l Conf. Machine Learning, pp. 49-56, 2004.- [32] R.C. Holte, "Very Simple Classification Rules Perform Well on Most Commonly Used Data Sets,"
Machine Learning, vol. 11, pp. 63-91, 1993.- [33] J. Huang and C.X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 3, pp. 299-310, Mar. 2005.- [34] T. Joachims, "A Support Vector Method for Multivariate Performance Measures, "
Proc. Int'l Conf. Machine Learning (ICML), 2005.- [35] M. Kurtcephe and H.A. Güvenir, "A Discretization Method Based on Maximizing the Area under ROC Curve,"
Int'l J. Pattern Recognition and Artificial Intelligence, vol. 27, no. 1,article 1350002, 2013.- [36] C.L. Ling and H. Zhang, "Toward Bayesian Classifiers with Accurate Probabilities,"
Proc. Sixth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, pp. 123-134, 2002.- [37] C. Marrocco, M. Molinara, and F. Tortorella, "Exploiting AUC for Optimal Linear Combinations of Dichotomizers,"
Pattern Recognition Letters, vol. 27, no. 8, pp. 900-907, 2006.- [38] C. Marrocco, R.P.W. Duin, and F. Tortorella, "Maximizing the Area under the ROC Curve by Pairwise Feature Combination,"
Pattern Recognition, vol. 41, pp. 1961-1974, 2008.- [39] M.C. Mozer, R. Dodier, M.D. Colagrosso, C. Guerra-Salcedo, and R. Wolniewicz, "Prodding the ROC Curve: Constrained Optimization of Classifier Performance,"
Proc. Conf. Advances in Neural Information Processing Systems, vol. 14, pp. 1409-1415, 2002.- [40] R. Prati and P. Flach, "Roccer: A ROC Convex Hull Rule Learning Algorithm,"
Proc. ECML/PKDD Workshop Advances in Inductive Rule Learning, pp. 144-153, 2004.- [41] F. Provost and T. Fawcett, "Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions,"
Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.- [42] F. Provost, T. Fawcett, and R. Kohavi, "The Case against Accuracy Estimation for Comparing Induction Algorithms,"
Proc. 15th Int'l Conf. Machine Learning, pp. 445-453, 1998.- [43] A. Rakotomamonjy, "Optimizing Area under ROC Curve with SVMS,"
Proc. Workshop ROC Analysis in Artificial Intelligence, pp. 71-80, 2004.- [44] M. Sebag, J. Aze, and N. Lucas, "ROC-Based Evolutionary Learning: Application to Medical Data Mining,"
Artificial Evolution, vol. 2936, pp. 384-396, 2004.- [45] D.J.M. Tax, R.P.W. Duin, and Y. Arzhaeva, "Linear Model Combining by Optimizing the Area under the ROC Curve,"
Proc. IEEE 18th Int'l Conf. Pattern Recognition, pp. 119-122, 2006.- [46] K.A. Toh, J. Kim, and S. Lee, "Maximizing Area under ROC Curve for Biometric Scores Fusion,"
Pattern Recognition, vol. 41, pp. 3373-3392, 2008.- [47] F. Wang and X. Chang, "Cost-Sensitive Support Vector Ranking for Information Retrieval,"
J. Convergence Information Technology, vol. 5, no. 10, pp. 109-116, 2010.- [48] M. Wasikowski and X. Chen, "Combating the Small Sample Class Imbalance Problem Using Feature Selection,"
IEEE Trans. Knowledge Discovery and Data Eng., vol. 22, no. 10, pp. 1388-1400, Oct. 2010.- [49] F. Wilcoxon, "Individual Comparisons by Ranking Methods,"
Biometrics, vol. 1, pp. 80-83, 1945.- [50] T.-F. Wu, C.-J. Lin, and W.C. Wen, "Probability Estimates for Multi-Class Classification by Pairwise Coupling,"
J. Machine Learning Research, vol. 5, pp. 975-1005, 2004.- [51] L. Yan, R. Dodier, M.C. Mozer, and R. Wolniewicz, "Optimizing Classifier Performance via the Wilcoxon-Mann-Whitney Statistics,"
Proc. 20th Int'l Conf. Machine Learning, pp. 848-855, 2003. |