Subscribe
Issue No.09 - Sept. (2013 vol.35)
pp: 2051-2063
Qi Mao , Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
I. W-H Tsang , Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
ABSTRACT
Feature selection with specific multivariate performance measures is the key to the success of many applications such as image retrieval and text classification. The existing feature selection methods are usually designed for classification error. In this paper, we propose a generalized sparse regularizer. Based on the proposed regularizer, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. In addition, we adapt the proposed method to optimize multivariate measures for multiple-instance learning problems. The analyses by comparing with the state-of-the-art feature selection methods show that the proposed method is superior to others. Extensive experiments on large-scale and high-dimensional real-world datasets show that the proposed method outperforms l1-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVMperl in terms of F1-score.
INDEX TERMS
Loss measurement, Vectors, Support vector machines, Kernel, Convergence, Error analysis, Optimization,structural SVMs, Feature selection, performance measure, multiple kernel learning, multi-instance learning
CITATION
Qi Mao, I. W-H Tsang, "A Feature Selection Method for Multivariate Performance Measures", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 9, pp. 2051-2063, Sept. 2013, doi:10.1109/TPAMI.2012.266
REFERENCES
 [1] T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. Int'l Conf. Machine Learning, 2005. [2] X. Zhang, A. Saha, and S. Vishwanathan, "Smoothing Multivariate Performance Measures," Proc. Uncertainty Artificial Intelligence Conf., 2011. [3] C.H. Teo, S. Vishwanathan, A. Smola, and Q.V. Le, "Bundle Methods for Regularized Risk Minimization," J. Machine Learning Research, vol. 11, pp. 311-365, 2010. [4] T. Joachims, T. Finley, and C.J. Yu, "Cutting-Plane Training of Structural SVMs," Machine Learning, vol. 77, pp. 27-59, 2009. [5] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altum, "Large Margin Methods for Structured and Interdependent Output Variables," J. Machine Learning Research, vol. 6, pp. 1453-1484, 2005. [6] D.R. Musicant, V. Kumar, and A. Ozgur, "Optimizing F-Measure with Support Vector Machines," Proc. 16th Int'l Florida Artificial Intelligence Research Soc. Conf., 2003. [7] H. Valizadengan, R. Jin, R. Zhang, and J. Mao, "Learning to Rank by Optimizing NDCG Measure," Proc. Advances in Neural Information Processing Systems, 2009. [8] T. Joachims, "Training Linear SVMs in Linear Time," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2006. [9] Q.V. Le and A. Smola, "Direct Optimization of Ranking Measures," J. Machine Learning Research, vol. 1, pp. 1-48, 2007. [10] T.N. Lal, O. Chapelle, J. Weston, and A. Elisseeff, "Embedded Methods," Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, I. Guyon, S. Gunn, M. Nikravesh, and L.A. Zadeh, eds., pp. 137-165, Springer, 2006. [11] J. Zhu, S. Rossett, T. Hastie, and R. Tibshirani, "1-Norm Support Vector Machine," Proc. Advances in Neural Information Processing Systems, 2003. [12] G.M. Fung and O.L. Mangasarian, "A Feature Selection Newton Method for Support Vector Machine Classification," Computational Optimization and Applications, vol. 28, pp. 185-202, 2004. [13] A.Y. Ng, "Feature Selection, $\ell_1$ vs. $\ell_2$ Regularization, and Rotational Invariance," Proc. Int'l Conf. Machine Learning, 2004. [14] G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin, "A Comparison of Optimization Methods and Software for Large-Scale $l_1$ -Regularized Linear Classification," J. Machine Learning Research, vol. 11, pp. 3183-3234, 2010. [15] J. Weston, A. Elisseeff, and B. Scholköpf, "Use of the Zero-Norm with Linear Models and Kernel Methods," J. Machine Learning Research, vol. 3, pp. 1439-1461, 2003. [16] Z. Liu, F. Jiang, G. Tian, S. Wang, F. Sato, S.J. Meltzer, and M. Tan, "Sparse Logistic Regression with $l_p$ Penalty for Biomarker Identification," Statistical Applications in Genetics and Molecular Biology, vol. 6, no. 1, 2007. [17] D. Lin, D.P. Foster, and L.H. Ungar, "A Risk Ratio Comparison of $l_0$ and $l_1$ Penalized Regressions," technical report, Univ. of Pennsylvania, 2010. [18] T. Zhang, "Analysis of Multi-Stage Convex Relaxation for Sparse Regularization," J. Machine Learning Research, vol. 11, pp. 1081-1107, Mar. 2010. [19] I. Guyou, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002. [20] Z. Xu, R. Jin, J. Ye, M.R. Lyu, and I. King, "Non-Monotonic Feature Selection," Proc. Int'l Conf. Machine Learning, 2009. [21] M. Tan, L. Wang, and I.W. Tsang, "Learning Sparse SVM for Feature Selection on Very High Dimensional Data Sets," Proc. Int'l Conf. Machine Learning, 2010. [22] Q. Mao and I.W. Tsang, "Optimizing Performance Measures for Feature Selection," Proc. Int'l Conf. Data Mining, 2011. [23] F.R. Bach, G.R.G. Lanckriet, and M.I. Jordan, "Multiple Kernel Learning, Conic Duality, and the SMO Algorithm," Proc. Int'l Conf. Machine Learning, 2004. [24] Y. Chen, J. Bi, and J.Z. Wang, "MILES: Multiple-Instance Learning via Embedded Instance Selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006. [25] A. Rakotomamonjy, F.R. Bach, Y. Grandvalet, and S. Canu, "SimpleMKL," J. Machine Learning Research, vol. 3, pp. 1439-1461, 2008. [26] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Optimization with Sparsity-Inducing Penalties," Foundations and Trends in Machine Learning, vol. 4, pp. 1-106, 2012. [27] J.M. Borwein and A.S. Lewis, Convex Analysis and Nonlinear Optimization. Springer, 2000. [28] J.E. Kelley, "The Cutting Plane Algorithm for Solving Convex Programs," J. SIAM, vol. 8, no. 4, pp. 703-712, 1960. [29] A. Mutapcic and S. Boyd, "Cutting-Set Methods for Robust Convex Optimization with Pessimizing Oracles," Optimization Methods & Software, vol. 24, no. 3, pp. 381-406, 2009. [30] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Scholköpf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006. [31] Z. Xu, R. Jin, I. King, and M.R. Lyu, "An Extended Level Method for Efficient Multiple Kernel Learning," Proc. Advances in Neural Information Processing Systems, 2008. [32] J.B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms. Springer-Verlag, 1993. [33] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A Support Vector Method for Optimizing Average Precision," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2007. [34] A. Zien and C.S. Ong, "Multiclass Multiple Kernel Learning," Proc. Int'l Conf. Machine Learning, 2007. [35] T.G. Dietterich, R.H. Lathrop, and T. Lozano-Perez, "Solving the Multiple Instance Problem with Axis-Parallel Rectangles," Artificial Intelligence, vol. 89, pp. 31-71, 1997. [36] Q. Zhang, S. Goldman, W. Yu, and J. Fritts, "Content-Based Image Retrieval Using Multiple-Instance Learning," Proc. Int'l Conf. Machine Learning, 2002. [37] O. Maron and A.L. Ratan, "Multiple-Instance Learning for Natural Scene Classification," Proc. Int'l Conf. Machine Learning, 1998. [38] S. Andrews, I. Tsochantaridis, and T. Hofmann, "Support Vector Machines for Multiple-Instance Learning," Proc. Advances in Neural Information Processing Systems, 2003. [39] Z.-H. Zhou and M.-L. Zhang, "Multi-Instance Multi-Label Learning with Application to Scene Classification," Proc. Advances in Neural Information Processing Systems, 2007.