This Article 
 Bibliographic References 
 Add to: 
Fast SVM Training Algorithm with Decomposition on Very Large Data Sets
April 2005 (vol. 27 no. 4)
pp. 603-618
Training a support vector machine on a data set of huge size with thousands of classes is a challenging problem. This paper proposes an efficient algorithm to solve this problem. The key idea is to introduce a parallel optimization step to quickly remove most of the nonsupport vectors, where block diagonal matrices are used to approximate the original kernel matrix so that the original problem can be split into hundreds of subproblems which can be solved more efficiently. In addition, some effective strategies such as kernel caching and efficient computation of kernel matrix are integrated to speed up the training process. Our analysis of the proposed algorithm shows that its time complexity grows linearly with the number of classes and size of the data set. In the experiments, many appealing properties of the proposed algorithm have been investigated and the results show that the proposed algorithm has a much better scaling capability than Libsvm, {\rm{SVM}}^{light}, and {\rm{SVMTorch}}. Moreover, the good generalization performances on several large databases have also been achieved.

[1] V.N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[2] N. Cristianini and J.S. Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[3] B. Schölkopf and A.J Smola, Learning with Kernels. Cambridge, Mass.: MIT Press, 2002.
[4] B. Schölkopf, C.J.C. Burges, and V.N. Vapnik, “Extracting Support Data for a Given Task,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 252-257, 1995.
[5] D. DeCoste and B. Schölkopf, “Training Invariant Support Vector Machines,” Machine Learning, vol. 46, nos. 1-3, pp. 161-190, 2002.
[6] J.X. Dong, C.Y. Suen, and A. Krzyżak, “A Fast SVM Training Algorithm,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 17, no. 3, pp. 367-384, 2003.
[7] T. Joachims, “Text Categorization with Support Vector Machine: Learning with Many Relevant Features,” Proc. ECML-98, 10th European Conf. Machine Learning, pp. 137-142, 1998.
[8] E. Osuna, R. Freund, and F. Girosi, “Training Support Vector Machines: An Application to Face Detection,” Proc. IEEE. Conf. Computer Vision and Pattern Recognition, pp. 130-136, 1997.
[9] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Clarendon Press, 1995.
[10] J. Verbeek, N. Vlassis, and B. Kröse, “Efficient Greedy Learning of Gaussian Mixture Models,” Neural Computation, vol. 15, no. 2, pp. 469-485, Feb. 2003.
[11] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, pp. 1-38, 1977.
[12] J.C. Platt, “Fast Training of Support Vector Machines Using Sequential Minimal Optimization,” Advances in Kernel Methods: Support Vector Machines, B. Schölkopf, C.J.C. Burges, and A. Smola, eds., pp. 185-208, Cambridge, Mass.: MIT Press, Dec. 1998.
[13] S.S. Keerthi, S.K. Shevade, C. Bhattachayya, and K.R.K. Murth, “Improvements to Platt's SMO Algorithm for SVM Classifier Design,” Neural Computation, vol. 13, pp. 637-649, Mar. 2001.
[14] T. Joachims, “Making Large-Scale Support Vector Machine Learning Practical,” Advances in Kernel Methods: Support Vector Machines, B. Schölkopf, C.J.C. Burges, and A. Smola, eds., pp. 169-184, Cambridge, Mass.: MIT Press, Dec. 1998.
[15] H. Kuhn and A. Tucker, “Nonlinear Programming,” Proc. Second Berkeley Symp. Math. Statistics and Probabilistics, pp. 481-492, 1951.
[16] C.C. Chang and C.J. Lin, “Libsvm: A Library for Support Vector Machines,” technical report, Dept. of Computer Science and Information Eng., Nat'l Taiwan Univ., 2003.
[17] G.W. Flake and S. Lawrence, “Efficient SVM Regression Training with SMO,” Machine Learning, vol. 46, nos. 1-3, pp. 271-290 Mar. 2002.
[18] R. Collobert, S. Bengio, and Y. Bengio, “A Parallel Mixture of SVMs for Very Large Scale Problems,” Neural Computation, vol. 14, no. 5, pp. 1105-1114, 2002.
[19] R. Collobert and S. Bengio, “SVMTorch: Support Vector Machines for Large-Scale Regression Problems,” J. Machine Learning Research, vol. 1, pp. 143-160, 2001.
[20] A. Rida, A. Labbi, and C. Pellegrini, “Local Experts Combination through Density Decomposition,” Proc. Seventh Int'l Workshop AI and Statistics, D. Heckerman and J. Whittaker, eds., Jan. 1999.
[21] V. Tresp, “A Bayesian Committee Machine,” Neural Computation, vol. 12, no. 11, pp. 2719-2741, 2000.
[22] A. Schwaighofer and V. Tresp, “The Bayesian Committee Support Vector Machine,” Proc. Int'l Conf. Artificial Neural Networks, pp. 411–417, 2001.
[23] D.A. Patterson and J.L. Hennessy, Computer Architecture: A Quantitative Approach, second ed. San Fransisco: Morgan Kaufmann, 1996.
[24] J.J. Dongarra, J. Du Croz, I.S. Duff, and S. Hammarling, “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM Trans. Math. Software, vol. 16, pp. 1-17, 1990.
[25] R.C. Whaley, A. Petitet, and J.J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,” technical report, Dept. of Computer Science, Univ. of Tennessee, 2000.
[26] B.E. Boser, I.M. Guyon, and V.N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Ann. ACM Workshop Computational Learning Theory, D. Haussler, ed., pp. 144-152, 1992.
[27] S.S. Keerthi and E.G. Gilbert, “Convergence of a Generalized SMO Algorithm for SVM Classifier Design,” Machine Learning, vol. 46, no. 3, pp. 351-360, Mar. 2002.
[28] Intel Corporation, “The IA-32 Intel Architecture Software Developer's Manual,” vol. 1: basic architecture, order number: 245470, 2002.
[29] J.E.R. Staddon, Adaptive Behavior and Learning. Cambridge, U.K.: Cambridge Univ. Press, 1983.
[30] B.V. Gnedenko, Y.K. Belyayev, and A.D. Solovyev, Mathematical Methods of Reliability Theory. New York: Academic Press, 1969.
[31] A. Ben-Hur, H.T. Siegelmann, and S. Fishman, “Complexity for Continuous Time Systems,” J. Complexity, vol. 18, no. 1, pp. 51-86, 2002.
[32] Y. LeCun, L.D. Jackel, L. Bottou, J.S. Denker, H. Drucker, I. Guyon, U.A. Müller, E. Sackinger, P. Simard, and V.N. Vapnik, “Comparison of Learning Algorithms for Handwritten Digit Recognition,” Proc. Int'l Conf. Artificial Neural Network, F. Fogelman and P. Gallinari, eds., pp. 53-60 1995.
[33] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. New York: Academic Press, 1990.
[34] J.X. Dong, A. Krzyżak, and C.Y. Suen, “High Accuracy Handwritten Chinese Character Recognition Using Support Vector Machine,” Proc. Int'l Workshop Artificial Neural Networks on Pattern Recognition, 2003.
[35] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. New York: Wiley, 2001.
[36] R.C. Whaley and J.J. Dongarra, “Automatically Tuned Linear Algebra Software (ATLAS),” Proc. High Performance Networking and Computing, 1998.
[37] Intel Corporation, “Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual,” order number: 248966, 2002.
[38] L. Breiman, “Bias, Variance, and Arcing Classifiers,” Technical Report 460, Statistics Dept., Univ. of California, Berkeley, Apr. 1996.
[39] R. Collobert, Y. Bengio, and S. Bengio, “Scaling Large Learning Problems with Hard Parallel Mixtures,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 17, no. 3, pp. 349-365, 2003.
[40] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 1, pp. 49-64, 1996.
[41] K. Fukunaga and T.F. Krile, “Calculation of Bayes' Recognition Error for Two Multivariate Gaussian Distributions,” IEEE Trans. Computers, vol. 18, no. 3, pp. 220–229, Mar. 1969.
[42] J.A. Blackard and D.J. Dean, “Comparative Accuracies of Artificial Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables,” Computers and Electronics in Agriculture, vol. 24, pp. 131-151, 1999.

Index Terms:
Support vector machines (SVMs), algorithm design and analysis, algorithm efficiency, machine learning, handwritten character recognition.
Jian-xiong Dong, Adam Krzyzak, Ching Y. Suen, "Fast SVM Training Algorithm with Decomposition on Very Large Data Sets," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 603-618, April 2005, doi:10.1109/TPAMI.2005.77
Usage of this product signifies your acceptance of the Terms of Use.