Subscribe
Issue No.06 - June (2013 vol.35)
pp: 1370-1382
Nan Li , Nanjing University, Nanjing and Soochow University, Suzhou
Ivor W. Tsang , Nanyang Technological University, Singapore
Zhi-Hua Zhou , Nanjing University, Nanjing
ABSTRACT
In practical applications, machine learning algorithms are often needed to learn classifiers that optimize domain specific performance measures. Previously, the research has focused on learning the needed classifier in isolation, yet learning nonlinear classifier for nonlinear and nonsmooth performance measures is still hard. In this paper, rather than learning the needed classifier by optimizing specific performance measure directly, we circumvent this problem by proposing a novel two-step approach called CAPO, namely, to first train nonlinear auxiliary classifiers with existing learning methods and then to adapt auxiliary classifiers for specific performance measures. In the first step, auxiliary classifiers can be obtained efficiently by taking off-the-shelf learning algorithms. For the second step, we show that the classifier adaptation problem can be reduced to a quadratic program problem, which is similar to linear $({\rm SVM}^{\rm perf})$ and can be efficiently solved. By exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear classifier which optimizes a large variety of performance measures, including all the performance measures based on the contingency table and AUC, while keeping high computational efficiency. Empirical studies show that CAPO is effective and of high computational efficiency, and it is even more efficient than linear $({\rm SVM}^{\rm perf})$.
INDEX TERMS
Loss measurement, Algorithm design and analysis, Training, Vectors, Upper bound, Kernel, Educational institutions, curriculum learning, Optimize performance measures, classifier adaptation, ensemble learning
CITATION
Nan Li, Ivor W. Tsang, Zhi-Hua Zhou, "Efficient Optimization of Performance Measures by Classifier Adaptation", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 6, pp. 1370-1382, June 2013, doi:10.1109/TPAMI.2012.172
REFERENCES
 [1] M. Alvira and R. Rifkin, "An Empirical Comparison of SNoW and SVMs for Face Detection," Technical Report 2001-004, CBCL, MIT, Cambridge, Mass., 2001. [2] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, "Curriculum Learning," Proc. Int'l Conf. Machine Learning, pp. 41-48, 2009. [3] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995. [4] C.J.C. Burges, R. Ragno, and Q. Le, "Learning to Rank with Nonsmooth Cost Functions," Advances in Neural Information Processing Systems 20, pp. 193-200, 2006. [5] Y. Cao, J. Xu, T.Y. Liu, H. Li, Y. Huang, and H.W. Hon, "Adapting Ranking SVM to Document Retrieval," Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 186-193, 2006. [6] R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes, "Ensemble Selection from Libraries of Models," Proc. Int'l Conf. Machine Learning, pp. 18-25, 2004. [7] C.C. Chang and C.J. Lin, "IJCNN 2001 Challenge: Generalization Ability and Text Decoding," Proc. Int'l Joint Conf. Neural Networks, pp. 1031-1036, 2001. [8] C. Cortes and M. Mohri, "AUC Optimization vs. Error Rate Minimization," Advances in Neural Information Processing Systems 16, pp. 313-320, 2004. [9] H. Daumé III and D. Marcu, "Domain Adaptation for Statistical Classifiers," J. Artificial Intelligence Research, vol. 26, pp. 101-126, 2006. [10] L. Duan, I.W. Tsang, D. Xu, and T.S. Chua, "Domain Adaptation from Multiple Sources via Auxiliary Classifiers," Proc. Int'l Conf. Machine Learning, pp. 289-296, 2009. [11] J.L. Elman, "Learning and Development in Neural Networks: The Importance of Starting Small," Cognition, vol. 48, no. 6, pp. 781-799, 1993. [12] C. Ferri, P. Flach, and J. Hernandez-Orallo, "Learning Decision Trees Using the Area under the ROC Curve," Proc. Int'l Conf. Machine Learning, pp. 139-146, 2002. [13] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Exploration Newsletter, vol. 11, no. 1, pp. 10-18, 2009. [14] A. Herschtal and B. Raskutti, "Optimising Area under the ROC Curve Using Gradient Descent," Proc. Int'l Conf. Machine Learning, pp. 49-56, 2004. [15] T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. Int'l Conf. Machine Learning, pp. 377-384, 2005. [16] T. Joachims, T. Finley, and C.-N. J. Yu, "Cutting-Plane Training of Structural SVMs," Machine Learning, vol. 76, no. 1, pp. 27-59, 2009. [17] T. Joachims and C.-N. J. Yu, "Sparse Kernel SVMs via Cutting-Plane Training," Machine Learning, vol. 76, nos. 2/3, pp. 179-193, 2009. [18] J. Lafferty and C. Zhai, "Document Language Models, Query Models, and Risk Minimization for Information Retrieval," Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 111-119, 2001. [19] J. Langford and B. Zadrozny, "Estimating Class Membership Probabilities Using Classifier Learners," Proc. Int'l Workshop Artificial Intelligence and Statistics, pp. 198-205, 2005. [20] D.D. Lewis, "Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks," Proc. Text REtrieval Conf., pp. 286-292, 2001. [21] X. Li and J. Bilmes, "A Bayesian Divergence Prior for Classifier Adaptation," Proc. Int'l Conf. Artificial Intelligence and Statistics, pp. 275-282, 2007. [22] C.J. Matheus and L.A. Rendell, "Constructive Induction on Decision Trees," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 645-650, 1989. [23] K. Morik, P. Brockhausen, and T. Joachims, "Combining Statistical Learning with a Knowledge-Based Approach—a Case Study in Intensive Care Monitoring," Proc. Int'l Conf. Machine Learning, pp. 268-277, 1999. [24] D.R. Musicant, V. Kumar, and A. Ozgur, "Optimizing F-Measure with Support Vector Machines," Proc. Int'l Florida Artificial Intelligence Research Soc. Conf., pp. 356-360, 2003. [25] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [26] I.W. Tsang, J.T. Kwok, and P.-M. Cheung, "Core Vector Machines: Fast SVM Training on Very Large Data Sets," J. Machine Learning Research, vol. 6, pp. 363-392, 2005. [27] H. Valizadegan, R. Jin, R. Zhang, and J. Mao, "Learning to Rank by Optimizing NDCG Measure," Advances in Neural Information Processing Systems 22, pp. 1883-1891, 2009, [28] J. Xu, T.Y. Liu, M. Lu, H. Li, and W.Y. Ma, "Directly Optimizing Evaluation Measures in Learning to Rank," Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 107-114, 2008. [29] J. Yang and A.G. Hauptmann, "A Framework for Classifier Adaptation and Its Applications in Concept Detection," Proc. ACM SIGMM Int'l Conf. Multimedia Information Retrieval, pp. 467-474, 2008. [30] J. Yang, R. Yan, and A.G. Hauptmann, "Cross-Domain Video Concept Detection Using Adaptive SVMs," Proc. Int'l Conf. Multimedia, pp. 188-197, 2007. [31] C.-N.J. Yu and T. Joachims, "Training Structural SVMs with Kernels Using Sampled Cuts," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 794-802, 2008. [32] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A Support Vector Method for Optimizing Average Precision," Proc. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 271-278, 2007. [33] Z.H. Zhou, Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, 2012. [34] Z.-H. Zhou and Y. Jiang, "NeC4.5: Neural Ensemble Based C4.5," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 6, pp. 770-773, June 2004.