This Article 
 Bibliographic References 
 Add to: 
Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters
August 2010 (vol. 32 no. 8)
pp. 1522-1528
Tobias Glasmachers, Dalle Molle Institute for Artificial Intelligence (IDSIA), Manno-Lugano
Christian Igel, Ruhr-Universität Bochum, Bochum
Adapting the hyperparameters of support vector machines (SVMs) is a challenging model selection problem, especially when flexible kernels are to be adapted and data are scarce. We present a coherent framework for regularized model selection of 1-norm soft margin SVMs for binary classification. It is proposed to use gradient-ascent on a likelihood function of the hyperparameters. The likelihood function is based on logistic regression for robustly estimating the class conditional probabilities and can be computed efficiently. Overfitting is an important issue in SVM model selection and can be addressed in our framework by incorporating suitable prior distributions over the hyperparameters. We show empirically that gradient-based optimization of the likelihood function is able to adapt multiple kernel parameters and leads to better models than four concurrent state-of-the-art methods.

[1] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. Fifth Ann. Workshop Computational Learning Theory, pp. 144-152, 1992.
[2] C. Cortes and V. Vapnik, "Support-Vector Networks," Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[3] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[4] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
[5] T. Jaakkola, M. Diekhaus, and D. Haussler, "Using the Fisher Kernel Method to Detect Remote Protein Homologies," Proc. Seventh Int'l Conf. Intelligent Systems for Molecular Biology, pp. 149-158, 1999.
[6] V. Vapnik and O. Chapelle, "Bounds on Error Expectation for Support Vector Machines," Neural Computation, vol. 12, pp. 2013-2036, 2000.
[7] N. Cristianini, A. Elisseeff, J. Shawe-Taylor, and J. Kandola, "On Kernel-Target Alignment," Advances in Neural Information Processing Systems, pp. 367-373, MIT Press, 2001.
[8] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 131-159, 2002.
[9] S.S. Keerthi, "Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms," IEEE Trans. Neural Networks, vol. 13, no. 5, pp. 1225-1229, Sept. 2002.
[10] T. Glasmachers and C. Igel, "Gradient-Based Adaptation of General Gaussian Kernels," Neural Computation, vol. 17, no. 10, pp. 2099-2105, 2005.
[11] S.S. Keerthi, V. Sindhwani, and O. Chapelle, "An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models," Advances in Neural Information Processing Systems, vol. 19, B. Schölkopf, J. Platt, and T. Hoffman, eds., MIT Press, 2007.
[12] C. Igel, T. Glasmachers, B. Mersch, N. Pfeifer, and P. Meinicke, "Gradient-Based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 2, pp. 216-226, Apr. 2007.
[13] P.S. Bradley and O.L. Mangasarian, "Feature Selection via Concave Minimization and Support Vector Machines," Proc. Int'l Conf. Machine Learning, pp. 82-90, 1998.
[14] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods," Advances in Large Margin Classifiers, pp. 61-74, MIT Press, 1999.
[15] M.E. Tipping, "Sparse Bayesian Learning and the Relevance Vector Machine," J. Machine Learning Research, vol. 1, pp. 211-244, 2001.
[16] T. Zhang, "Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization," Annals of Statistics, vol. 32, no. 1, pp. 56-85, 2004.
[17] P.L. Bartlett and A. Tewari, "Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results," J. Machine Learning Research, vol. 8, pp. 775-790, 2007.
[18] M. Opper and O. Winther, "Gaussian Process Classification and SVM: Mean Field Results," Advances in Large Margin Classifiers, P. Bartlett, B. Schölkopf, D. Schuurmans, and A. Smola, eds., MIT Press, 1999.
[19] M. Seeger, "Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers," Advances in Neural Information Processing Systems, vol. 12, pp. 603-609, MIT Press, 2000.
[20] C. Gold and P. Sollich, "Model Selection for Support Vector Machine Classification," Neurocomputing, vol. 55, nos. 1-2, pp. 221-249, 2003.
[21] G.C. Cawley and N.L.C. Talbot, "Preventing Over-Fitting During Model Selection via Bayesian Regularisation of the Hyper-Parameters," J. Machine Learning Research, vol. 8, pp. 841-861, 2007.
[22] T. Glasmachers and C. Igel, "Maximum-Gain Working Set Selection for Support Vector Machines," J. Machine Learning Research, vol. 7, pp. 1437-1466, 2006.
[23] C. Igel, T. Glasmachers, and V. Heidrich-Meisner, "Shark," J. Machine Learning Research, vol. 9, pp. 993-996, 2008.
[24] F. Friedrichs and C. Igel, "Evolutionary Tuning of Multiple SVM Parameters," Neurocomputing, vol. 64, no. C, pp. 107-117, 2005.
[25] T. Glasmachers and C. Igel, "Uncertainty Handling in Model Selection for Support Vector Machines," Parallel Problem Solving from Nature, G. Rudolph, T. Jansen, S. Lucas, C. Poloni, and N. Beume, eds., pp. 185-194, Springer, 2008.
[26] K.M. Chung, W.C. Kao, C.L. Sun, L.L. Wang, and C.-J. Lin, "Radius Margin Bounds for Support Vector Machines with the RBF Kernel," Neural Computation, vol. 15, no. 11, pp. 2643-2681, 2003.
[27] K. Duan, S.S. Keerthi, and A.N. Poo, "Evaluation of Simple Performance Measures for Tuning SVM Hyperparameters," Neurocomputing, vol. 51, no. 1, pp. 41-60, 2003.
[28] H.-T. Lin, C.-J. Lin, and R.C. Weng, "A Note on Platt's Probabilistic Outputs for Support Vector Machines," Machine Learning, vol. 68, pp. 267-276, 2007.
[29] O. Chapelle, "Support Vector Machines: Induction Principle, Adaptive Tuning and Prior Knowledge," PhD dissertation, Laboratoire d'Informatique de Paris 6, 2002.
[30] G. Wahba, "Soft and Hard Classification by Reproducing Kernel Hilbert Space Methods," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 26, pp. 16524-16530, 2002.
[31] G.C. Cawley and N.L.C. Talbot, "Efficient Approximate Leave-One-Out Cross-Validation for Kernel Logistic Regression," Machine Learning, vol. 71, no. 2, pp. 243-264, 2008.
[32] T. Suttorp, N. Hansen, and C. Igel, "Efficient Covariance Matrix Update for Variable Metric Evolution Strategies," Machine Learning, vol. 75, no. 2, pp. 167-197, 2009.
[33] N. Hansen and A. Ostermeier, "Completely Derandomized Self-Adaptation in Evolution Strategies," Evolutionary Computation, vol. 9, no. 2, pp. 159-195, 2001.
[34] C. Igel and M. Hüsken, "Empirical Evaluation of the Improved Rprop Learning Algorithm," Neurocomputing, vol. 50, pp. 105-123, 2003.
[35] G. Rätsch, T. Onoda, and K.-R. Müller, "Soft Margins for AdaBoost," Machine Learning, vol. 42, no. 3, pp. 287-320, 2001.
[36] A. Asuncion and D.J. Newman, "UCI Machine Learning Repository," , 2007.

Index Terms:
Support vector machines, model selection, regularization, maximum likelihood.
Tobias Glasmachers, Christian Igel, "Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1522-1528, Aug. 2010, doi:10.1109/TPAMI.2010.95
Usage of this product signifies your acceptance of the Terms of Use.