The Community for Technology Leaders
Subscribe
Issue No.02 - February (2011 vol.33)
pp: 368-381
Şeyda Ertekin , Massachusetts Institute of Technology, Cambridge
Léon Bottou , NEC Labs America, Princeton
C. Lee Giles , The Pennsylvania State University, University Park
ABSTRACT
In this paper, we propose a nonconvex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has the strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier filtering mechanism (LASVM-I) based on approximating nonconvex behavior in convex optimization. These two algorithms are built upon another novel SVM algorithm (LASVM-G) that is capable of generating accurate intermediate models in its iterative steps by leveraging the duality gap. We present experimental results that demonstrate the merit of our frameworks in achieving significant robustness to outliers in noisy data classification where mislabeled training instances are in abundance. Experimental evaluation shows that the proposed approaches yield a more scalable online SVM algorithm with sparser models and less computational running time, both in the training and recognition phases, without sacrificing generalization performance. We also point out the relation between nonconvex optimization and min-margin active learning.
INDEX TERMS
Online learning, nonconvex optimization, support vector machines, active learning.
CITATION
Şeyda Ertekin, Léon Bottou, C. Lee Giles, "Nonconvex Online Support Vector Machines", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 2, pp. 368-381, February 2011, doi:10.1109/TPAMI.2010.109
REFERENCES
 [1] O. Bousquet and A. Elisseeff, "Stability and Generalization," J. Machine Learning, vol. 2, pp. 499-526, 2002. [2] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002. [3] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004. [4] C. Cortes and V. Vapnik, "Support Vector Networks," Machine Learning, vol. 20, pp. 273-297, 1995. [5] L. Mason, P.L. Bartlett, and J. Baxter, "Improved Generalization through Explicit Optimization of Margins," Machine Learning, vol. 38, pp. 243-255, 2000. [6] N. Krause and Y. Singer, "Leveraging the Margin More Carefully," Proc. Int'l Conf. Machine Learning, p. 63, 2004. [7] F. Perez-Cruz, A. Navia-Vazquez, and A.R. Figueiras-Vidal, "Empirical Risk Minimization for Support Vector Classifiers," IEEE Trans. Neural Networks, vol. 14, no. 2, pp. 296-303, Mar. 2002. [8] D.S.L. Xu and K. Cramer, "Robust Support Vector Machine Training via Convex Outlier Ablation," Proc. 21st Nat'l Conf. Artificial Intelligence, 2006. [9] Y. Liu, X. Shen, and H. Doss, "Multicategory $\psi$ Learning and Support Vector Machine: Computational Tools," J. Computational and Graphical Statistics, vol. 14, pp. 219-236, 2005. [10] L. Wang, H. Jia, and J. Li, "Training Robust Support Vector Machine with Smooth Ramp Loss in the Primal Space," Neurocomputing, vol. 71, pp. 3020-3025, 2008. [11] A.L. Yuille and A. Rangarajan, "The Concave-Convex Procedure (CCCP)," Advances in Neural Information Processing Systems. MIT Press, 2002. [12] R. Collobert, F. Sinz, J. Weston, and L. Bottou, "Trading Convexity for Scalability," Proc. Int'l Conf. Machine Learning, pp. 201-208, 2006. [13] A. Bordes, S. Ertekin, J. Weston, and L. Bottou, "Fast Kernel Classifiers with Online and Active Learning," J. Machine Learning Research, vol. 6, pp. 1579-1619, 2005. [14] S. Ertekin, J. Huang, L. Bottou, and L. Giles, "Learning on the Border: Active Learning in Imbalanced Data Classification," Proc. ACM Conf. Information and Knowledge Management, pp. 127-136, 2007. [15] J.C. Platt, "Fast Training of Support Vector Machines Using Sequential Minimal Optimization," Advances in Kernel Methods: Support Vector Learning, pp. 185-208, MIT Press, 1999. [16] S. Shalev-Shwartz and N. Srebro, "SVM Optimization: Inverse Dependence on Training Set Size," Proc. Int'l Conf. Machine Learning, pp. 928-935, 2008. [17] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy, "Improvements to Platt's SMO Algorithm for SVM Classifier Design," Neural Computation, vol. 13, no. 3, pp. 637-649, 2001. [18] O. Chapelle, "Training a Support Vector Machine in the Primal," Neural Computation, vol. 19, no. 5, pp. 1155-1178, 2007. [19] I. Steinwart, "Sparseness of Support Vector Machines," J. Machine Learning Research, vol. 4, pp. 1071-1105, 2003. [20] G. Schohn and D. Cohn, "Less Is More: Active Learning with Support Vector Machines," Proc. Int'l Conf. Machine Learning, pp. 839-846, 2000. [21] T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Technical Report 23, Univ. Dortmund, 1997. [22] S. Tong and D. Koller, "Support Vector Machine Active Learning with Applications to Text Classification," J. Machine Learning Research, vol. 2, pp. 45-66, 2001. [23] T. Glasmachers and C. Igel, "Second-Order SMO Improves SVM Online and Active Learning," Neural Computation, vol. 20, no. 2, pp. 374-382, 2008.