Subscribe
Issue No.07 - July (2008 vol.30)
pp: 1158-1170
ABSTRACT
We consider the problem of learning the ranking function that maximizes a generalization of the Wilcoxon-Mann-Whitney statistic on the training data. Relying on an $\epsilon$-accurate approximation for the error-function, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from $\mathcal{O}(m^2)$, to $\mathcal{O}(m)$, where $m$ is the number of training samples. Experiments on public benchmarks for ordinal regression and collaborative filtering indicate that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when the algorithms are trained on the same data. However, since it is several orders of magnitude faster than the current state-of-the-art approaches, it is able to leverage much larger training datasets.
INDEX TERMS
Machine learning, Algorithms
CITATION
Vikas C. Raykar, Ramani Duraiswami, Balaji Krishnapuram, "A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 7, pp. 1158-1170, July 2008, doi:10.1109/TPAMI.2007.70776
REFERENCES
 [1] A. Mas-Colell, M. Whinston, and J. Green, Microeconomic Theory. Oxford Univ. Press, 1995. [2] G. Fung, R. Rosales, and B. Krishnapuram, “Learning Rankings via Convex Hull Separation,” Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds. MIT Press, 2006. [3] O. Dekel, C. Manning, and Y. Singer, “Log-Linear Models for Label Ranking,” Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, eds. MIT Press, 2004. [4] F. Wilcoxon, “Individual Comparisons by Ranking Methods,” Biometrics Bull., vol. 1, no. 6, pp. 80-83, Dec. 1945. [5] H.B. Mann and D.R. Whitney, “On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other,” The Annals of Math. Statistics, vol. 18, no. 1, pp. 50-60, 1947. [6] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to Rank Using Gradient Descent,” Proc. 22nd Int'l Conf. Machine Learning, 2005. [7] Y. Freund, R. Iyer, and R. Schapire, “An Efficient Boosting Algorithm for Combining Preferences,” J. Machine Learning Research, vol. 4, pp. 933-969, 2003. [8] L. Greengard, “Fast Algorithms for Classical Physics,” Science, vol. 265, no. 5174, pp. 909-914, 1994. [9] R. Herbrich, T. Graepel, P. Bollmann-Sdorra, and K. Obermayer, “Learning Preference Relations for Information Retrieval,” Proc. Int'l Conf. Machine Learning Workshop Learning for Text Categorization, pp. 80-84, 1998. [10] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 133-142, 2002. [11] W. Chu and Z. Ghahramani, “Preference Learning with Gaussian Processes,” Proc. 22nd Int'l Conf. Machine Learning, pp. 137-144, 2005. [12] R. Yan and A. Hauptmann, “Efficient Margin-Based Rank Learning Algorithms for Information Retrieval,” Proc. Int'l Conf. Image and Video Retrieval, 2006. [13] C. Burges, R. Ragno, and Q. Le, “Learning to Rank with Nonsmooth Cost Functions,” Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hoffman, eds. MIT Press, 2007. [14] K. Crammer and Y. Singer, “Pranking with Ranking,” Advances in Neural Information Processing Systems, vol. 14, pp. 641-647, 2002. [15] E.F. Harrington, “Online Ranking/Collaborative Filtering Using the Perceptron Algorithm,” Proc. 20th Int'l Conf. Machine Learning, 2003. [16] R. Caruana, S. Baluja, and T. Mitchell, “Using the Future to ‘Sort Out’ the Present: Rankprop and Multitask Learning for Medical Risk Evaluation,” Advances in Neural Information Processing Systems, 1995. [17] L. Yan, R. Dodier, M. Mozer, and R. Wolniewicz, “Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic,” Proc. 20th Int'l Conf. Machine Learning, pp. 848-855, 2003. [18] A. Rakotomamonjy, “Optimizing Area under the ROC Curve with SVMs,” ROC Analysis in Artificial Intelligence, pp. 71-80, 2004. [19] U. Brefeld and T. Scheffer, “AUC Maximizing Support Vector Learning,” Proc. ICML 2005 Workshop ROC Analysis in Machine Learning, 2005. [20] A. Herschtal and B. Raskutti, “Optimising Area under the ROC Curve Using Gradient Descent,” Proc. 21st Int'l Conf. Machine Learning, 2004. [21] R. Herbrich, T. Graepel, and K. Obermayer,“Large Margin Rank Boundaries for Ordinal Regression,” Advances in Large Margin Classifiers, pp. 115-132, MIT Press, 2000. [22] J. Nocedal and S.J. Wright, Numerical Optimization. Springer, 1999. [23] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1972. [24] N.C. Beauliu, “A Simple Series for Personal Computer Computation of the Error Function $Q(.)$ ,” IEEE Trans. Comm., vol. 37, no. 9, pp. 989-991, Sept. 1989. [25] C. Tellambura and A. Annamalai, “Efficient Computation of erfc(x) for Large Arguments,” IEEE Trans. Comm., vol. 48, no. 4, pp. 529-532, Apr. 2000. [26] C. Wei and Z. Ghahramani, “Gaussian Processes for Ordinal Regression,” The J. Machine Learning Research, vol. 6, pp. 1019-1041, 2005. [27] A.G. Gray and A.W. Moore, “Nonparametric Density Estimation: Toward Computational Tractability,” Proc. SIAM Int'l Conf. Data Mining, 2003. [28] C. Yang, R. Duraiswami, and L. Davis, “Efficient Kernel Machines Using the Improved Fast Gauss Transform,” Advances in Neural Information Processing Systems 17, L.K. Saul, Y. Weiss, and L.Bottou, eds, pp. 1561-1568, MIT Press, 2005. [29] V.C. Raykar and R. Duraiswami, The Improved Fast Gauss Transform with Applications to Machine Learning, Large Scale Kernel Machines, pp. 175-201, MIT Press, 2007.