The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.62)
pp: 1221-1233
Hanjiang Lai , Sun Yat-sen University, Guangzhou
Yan Pan , Sun Yat-sen University, Guangzhou
Cong Liu , Sun Yat-sen University, Guangzhou
Liang Lin , Sun Yat-sen University, Guangzhou
Jie Wu , Temple University, Philadelphia
ABSTRACT
Learning-to-rank for information retrieval has gained increasing interest in recent years. Inspired by the success of sparse models, we consider the problem of sparse learning-to-rank, where the learned ranking models are constrained to be with only a few nonzero coefficients. We begin by formulating the sparse learning-to-rank problem as a convex optimization problem with a sparse-inducing $(\ell_1)$ constraint. Since the $(\ell_1)$ constraint is nondifferentiable, the critical issue arising here is how to efficiently solve the optimization problem. To address this issue, we propose a learning algorithm from the primal dual perspective. Furthermore, we prove that, after at most $(O({1\over \epsilon } ))$ iterations, the proposed algorithm can guarantee the obtainment of an $(\epsilon)$-accurate solution. This convergence rate is better than that of the popular subgradient descent algorithm. i.e., $(O({1\over \epsilon^2} ))$. Empirical evaluation on several public benchmark data sets demonstrates the effectiveness of the proposed algorithm: 1) Compared to the methods that learn dense models, learning a ranking model with sparsity constraints significantly improves the ranking accuracies. 2) Compared to other methods for sparse learning-to-rank, the proposed algorithm tends to obtain sparser models and has superior performance gain on both ranking accuracies and training time. 3) Compared to several state-of-the-art algorithms, the ranking accuracies of the proposed algorithm are very competitive and stable.
INDEX TERMS
Prediction algorithms, Optimization, Machine learning algorithms, Vectors, Computational modeling, Support vector machines, Accuracy, Fenchel duality, Learning-to-rank, sparse models, ranking algorithm
CITATION
Hanjiang Lai, Yan Pan, Cong Liu, Liang Lin, Jie Wu, "Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm", IEEE Transactions on Computers, vol.62, no. 6, pp. 1221-1233, June 2013, doi:10.1109/TC.2012.62
REFERENCES
[1] S.S. Shwartz and Y. Singer, "On the Equivalence of Weak Learnability and Linear Separability: New Relaxations and Efficient Boosting Algorithms," Machine Learning J., vol. 80, no. 2, pp. 141-163, 2010.
[2] J. Borwein and A. Lewis, Convex Analysis and Nonlinear Optimization. Springer, 2006.
[3] C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Halmilton, and G. Hullender, "Learning to Rank Using Gradient Descent," Proc. Int'l Conf. Machine Learning (ICML '05), pp. 89-96, 2005.
[4] Z. Cao, T. Qin, T.Y. Liu, M.F. Tsai, and H. Li, "Learning to Rank: From Pairwise Approach to Listwise Approach," Proc. Int'l Conf. Machine Learning (ICML '07), pp. 129-136, 2007.
[5] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer, "An Efficient Boosting Algorithm for Combining Preferences," J. Machine Learning Research, vol. 4, pp. 933-969, 2003.
[6] T. Joachims, "Optimizing Search Engines Using Clickthrough Data," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 133-142, 2002.
[7] P. Li, C.J.C. Burges, and Q. Wu, "McRank: Learning to Rank Using Multiple Classification and Gradient Boosting," Proc. Neural Information Processing System (NIPS '07), pp. 845-852, 2007.
[8] Y. Yue, T. Finley, F. Radlinski, and T. Joachims, "A Support Vector Method for Optimizing Average Precision," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 271-278, 2007.
[9] M. Taylor, J. Guiver, S. Robertson, and T. Minka, "SoftRank: Optimising Non-Smooth Rank Metrics," Proc. Int'l Conf. Web Search and Data Mining (WSDM '08), pp. 77-86, 2008.
[10] Z.Y. Sun, T. Qin, J. Wang, and Q. Tao, "Robust Sparse Rank Learning for Non-Smooth Ranking Measures," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '09), pp. 259-266, 2009.
[11] D.P. Bertsekas, Nonlinear Programming, second ed. Athena Scientific, 1999.
[12] J. Xu and H. Li, "AdaRank: A Boosting Algorithm for Information Retrieval," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 391-398, 2007.
[13] Y. Cao, J. Xu, T.Y. Liu, H. Li, Y. Huang, and H.W. Hon, "Adapting Ranking SVM to Document Retrieval," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '06), pp. 186-193, 2006.
[14] T. Qin, X.D. Zhang, D.S. Wang, W.Y. Xiong, and H. Li, "Ranking with Multiple Hyperplanes," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 279-286, 2007.
[15] V. Vapnik, S. Golowich, and A.J. Smola, "Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing," Proc. Ann. Conf. Neural Information Processing Systems (NIPS '97), pp. 281-287, 1997.
[16] O. Chapelle and S.S. Keerthi, "Efficient Algorithms for Ranking with SVMs," Information Retrieval J., vol. 13, no. 3, pp. 201-215, 2010.
[17] J.I. Marden, Analyzing and Modeling Rank Data. Chapman & Hall, 1995.
[18] F. Xia, T.Y. Liu, J. Wang, W. Zhang, and H. Li, "Listwise Approach to Learning to Rank: Theory and Algorithm," Proc. Int'l Conf. Machine Learning (ICML '08), pp. 1192-1199, 2008.
[19] T. Joachims, "Training Linear SVMs in Linear Time," Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 217-226, 2006.
[20] T.Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li, "LETOR: Benchmark Data Set for Research on Learning to Rank for Information Retrieval," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '07), pp. 129-136, 2007.
[21] R.B. Yates and B.R. Neto, Modern Information Retrieval. Addison Wesley, 1999.
[22] W.R. Hersh, C. Buckley, T.J. Leone, and D.H. Hickam, "OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research," Proc. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '94), pp. 192-201, 1994.
[23] K. Jarvelin and J. Kekalainen, "Cumulated Gain-Based Evaluation of IR Techniques," ACM Trans. Information Systems, vol. 20, no. 4, pp. 422-446, 2002.
[24] R.M. Rifkin and R.A. Lippert, "Value Regularization and Fenchel Duality," J. Machine Learning Research, vol. 8, pp. 441-479, 2007.
[25] Y. Freund and R.E. Schapire, "A Short Introduction to Boosting," J. Japanese Soc. for Artificial Intelligence, vol. 14, no. 5, pp. 771-780, 1999.
[26] T. Joachims, "Making Large-Scale SVM Learning Practical," Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola eds., MIT Press, 1999.
[27] G.X. Yuan, K.W. Chang, C.J. Hsieh, and C.J. Lin, "A Comparison of Optimization Methods and Software for Large-Scale $\ell_1$ -Regularized Linear Classification," J. Machine Learning Research, vol. 11, no. 1, pp. 3183-3234, 2010.
[28] P. Tseng and S. Yun, "A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization," Math. Programming, vol. 117, nos. 1/2, pp. 387-423, 2009.
[29] J. Duchi, S.S. Shwartz, Y. Singer, and T. Chandra, "Efficient Projections onto the $\ell_1$ -Ball for Learning in High Dimensions," Proc. Int'l Conf. Machine Learning (ICML '08), pp. 272-279, 2008.
[30] J. Kim, Y. Kim, and Y. Kim, "A Gradient-Based Optimization Algorithm for LASSO," J. Computational and Graphical Statistics, vol. 17, no. 4, pp. 994-1009, 2008.
[31] J. Liu, J. Chen, and J.P. Ye, "Large-Scale Sparse Logistic Regression," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '09), pp. 547-556, 2009.
[32] D.L. Donoho and Y. Tsaig, "Fast Solution of $\ell_1$ Minimization Problems when the Solution May be Sparse," IEEE Trans. Information Theory, vol. 54, no. 11, pp. 4789-4812, Nov. 2008.
[33] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Optimization with Sparsity-Inducing Penalties," Technical Report HAL 00613125-v2, HAL, 2011.
[34] W. Chen, T.Y. Liu, Y.Y. Lan, Z.M. Ma, and H. Li, "Ranking Measures and Loss Functions in Learning to Rank," Proc. Advances in Neural Information Processing Systems (NIPS '09), pp. 315-323, 2009.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool