This Article 
 Bibliographic References 
 Add to: 
Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification
September 2002 (vol. 24 no. 9)
pp. 1184-1199

Abstract—We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm—one-class leveraging—starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

[1] H. Drucker, R. Schapire, and P.Y. Simard, “Boosting Performance in Neural Networks,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 7, pp. 705-719, 1993.
[2] Y.A. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Müller, E. Säckinger, P.Y. Simard, and V.N. Vapnik, “Comparison of Learning Algorithms for Handwritten Digit Recognition,” Proc. Int'l Conf. Artificial Neural Networks '95, F. Fogelman-Souliéand P. Gallinari, eds., vol. II, pp. 53-60, 1995.
[3] R. Maclin and D. Opitz, “An Empirical Evaluation of Bagging and Boosting,” Proc. 14th Nat'l Conf. Artifical Intelligence, pp. 546-551, 1997.
[4] H. Schwenk and Y. Bengio, “AdaBoosting Neural Networks,” Proc. Int'l Conf. Artificial Neural Networks '97, W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds., vol. 1327, pp. 967-972, 1997.
[5] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithm: Bagging, Boosting and Variants,” Machine Learning, vol. 36, pp. 105-142, 1999.
[6] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, no. 2, 1999.
[7] N. Duffy and D.P. Helmbold, “Leveraging for Regression,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 208-219, 2000.
[8] G. Rätsch,M. Warmuth, S. Mika, T. Onoda, S. Lemm, and K.-R. Müller, “Barrier Boosting,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 170-179, 2000.
[9] A. Demiriz, K.P. Bennett, and J. Shawe-Taylor, “Linear Programming Boosting via Column Generation,” J. Machine Learning Research, 2001 (special issue on support vector machines and kernel methods).
[10] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan College Press, New York, 1994.
[11] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[12] “Neural Networks: Tricks of the Trade,” Lecture Notes in Computer Science, G. Orr, and K.-R.Müller, eds., vol. 1524, 1998.
[13] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. Fifth Ann. Workshop Computational Learning Theory, ACM Press, New York, 1992, pp. 144-152.
[14] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[15] B. Schölkopf, Support Vector Learning. Munich: Oldenbourg Verlag, 1997.
[16] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[17] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[18] B. Schölkopf and A.J. Smola, Learning with Kernels. Cambridge, Mass.: MIT Press, 2002.
[19] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 1-47, 1998.
[20] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181-201, 2001.
[21] D.G. Luenberger, Linear and Nonlinear Programming, second ed. Reading, Mass.: Addison-Wesley, May 1984.
[22] B. Schölkopf, J. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.
[23] D. Tax and R. Duin, “Data Domain Description bySupport Vectors,” Proc. European Symp. Artificial Neural Network, M. Verleysen, ed., pp. 251-256, 1999.
[24] C. Campbell and K.P. Bennett, “A Linear Programming Approach to Novelty Detection,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 395-401, 2001.
[25] R.E. Schapire, Y. Freund, P.L. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, Oct. 1998.
[26] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, vol. 14, no. 5, pp. 771-780, Sept. 1999. (Appeared in Japanese, translation by Naoki Abe.)
[27] R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using Confidence-Rated Predictions,” Proc. Ann. Conf. Computer Learning Theory '98, pp. 80-91, 1998.
[28] O.L. Mangasarian, “Arbitrary-Norm Separating Plane,” Operation Research Letters, vol. 24, no. 1, pp. 15-23, 1999.
[29] L.G. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, no. 11, pp. 1134-1142, Nov. 1984.
[30] R.E. Schapire, “The Design and Analysis of Efficient Learning Algorithms,” PhD thesis, MIT Press, 1992.
[31] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Proc. EuroCOLT: European Conf. Computational Learning Theory, 1994.
[32] L. Breiman, “Prediction Games and Arcing Algorithms,” Neural Computation, vol. 11, no. 7, pp. 1493-1518, 1999. (Also Technical Report 504, Statistics Dept., Univ. of Calif., Berkeley.)
[33] G. Rätsch and M.K. Warmuth, “Marginal Boosting,” NeuroCOLT2 Technical Report 97, Royal Holloway College, London, July 2001. (extended version accepted for COLT '02).
[34] Y. Freund and R. Schapire, “Game Theory, On-Line Prediction and Boosting,” Proc. Ann. Conf. Computer Learning Theory, pp. 325-332, 1996.
[35] G. Rätsch, “Robust Boosting via Convex Optimization,” PhD thesis, Univ. of Potsdam, 2001.
[36] K.P. Bennett and O.L. Mangasarian, “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets,” Optimization Methods and Software, vol. 1, pp. 23-34, 1992.
[37] S. Chen, D. Donoho, and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 33-61, 1999.
[38] P. Bradley, O. Mangasarian, and J. Rosen, “Parsimonious Least Norm Approximation,” Computational Optimization and Applications, vol. 11, no. 1, pp. 5-21, 1998.
[39] G. Rätsch,T. Onoda, and K.-R. Müller, “Soft Margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287-320, Mar. 2001. (Also NeuroCOLT Technical Report NC-TR-1998-021.)
[40] O.L. Mangasarian, “Mathematical Programming in Data Mining,” Data Mining and Knowledge Discovery, vol. 42, no. 1, pp. 183-201, 1997.
[41] B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, Vol. 10, 1998, pp. 1299-1319.
[42] G. Rätsch, B. Schölkopf, A.J. Smola, S. Mika, T. Onoda, and K.-R. Müller, “Robust Ensemble Learning,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 207-219, Cambridge, Mass.: MIT Press, 2000.
[43] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.
[44] N. Duffy and D.P. Helmbold, “A Geometric Approach to Leveraging Weak Learners,” Proc. Computational Learning Theory: Fourth European Conf. (EuroCOLT '99), P. Fischer and H.U. Simon, eds., pp. 18-33, Mar. 1999.
[45] A.J. Grove and D. Schuurmans, “Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” Proc. 15th Nat'l Conf. Artifical Intelligence, 1998.
[46] J. Kivinen and M. Warmuth, “Boosting as Entropy Projection,” Proc. 12th Ann. Conf. Computer Learning Theory, pp. 134-144, 1999.
[47] J. Friedman, T. Hastie, and R.J. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 2, pp. 337-374, 2000. (with discussion, pp. 375-407, also technical report, Dept. of Statistics, Sequoia Hall, Stanford Univ.)
[48] L. Mason, J. Baxter, P.L. Bartlett, and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 221-247, Cambridge, Mass.: MIT Press, 2000.
[49] G. Rätsch, S. Mika, and M.K. Warmuth, “On the Convergence of Leveraging,” NeuroCOLT2 Technical Report 98, Royal Holloway College, London, Aug. 2001. (a shorter version accepted for NIPS '01).
[50] T. Zhang, “A General Greedy Approximation Algorithm with Applications,” Advances in Neural Information Processing Systems, vol. 14,MIT Press, 2002 (in press).
[51] L.M. Bregman, “The Relaxation Method for Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Math. and Math. Physics, vol. 7, pp. 200-127, 1967.
[52] Y. Censor and S.A. Zenios, “Parallel Optimization: Theory, Algorithms and Application,” Numerical Math. and Scientific Computation, Oxford Univ. Press, 1997.
[53] M. Collins, R.E. Schapire, and Y. Singer, “Logistic Regression, Adaboost and Bregman Distances,” Proc. Ann. Conf. Computer Learning Theory, pp. 158-169, 2000.
[54] S. Nash and A. Sofer, Linear and Nonlinear Programming. New York: McGraw-Hill, 1996.
[55] G. Rätsch, A. Demiriz, and K. Bennett, “Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces,” Machine Learning, vol. 48, nos. 1-3, pp. 193-221, 2002. (Also NeuroCOLT2 Technical Report NC-TR-2000-085.)
[56] R. Cominetti and J.-P. Dussault, “A Stable Exponential Penalty Algorithm with Superlinear Convergence,” J. Application Theory and Optimization, vol. 83, no. 2, Nov. 1994.
[57] M. Doljansky and M. Teboulle, “An Interior Proximal Algorithm and the Exponential Multiplier Method for Semidefinite Programming,” SIAM J. Optimization, vol. 9, no. 1, pp. 1-13, 1998.
[58] L. Mosheyev and M. Zibulevsky, “Penalty/Barrier Multiplier Algorithm for Semidefinite Programming,” Optimization Methods and Software, 1999.
[59] Z.-Q. Luo and P. Tseng, “On the Convergence of Coordinate Descent Method for Convex Differentiable Minimization,” J. Optimization Theory and Applications, vol. 72, no. 1, pp. 7-35, 1992.
[60] S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.
[61] T. Poggio and F. Girosi, “Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and Clustering,” Technical Report AIM-1167, MIT-AI Lab, Mar. 1990.
[62] A.J. Smola, “Learning with Kernels,” PhD thesis, Technische Universität Berlin, 1998.
[63] S.L. Salzberg, “A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA,” Computational Applied Bioscience, vol. 13, no. 4, pp. 365-376, 1997.
[64] S. Sonnenburg, G. Rätsch, A. Jagota, and K. -R. Müller, “Splice-Site Recognition with Support Vector Machines,” to be published in Proc. Int'l Conf. Artificial Neural Networks '02, 2002.
[65] P. Hayton, B. Schölkopf, L. Tarassenko, and P. Anuzis, “Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 946-952, 2001.

Index Terms:
Boosting, SVMs, one-class classification, unsupervised learning, novelty detection.
Gunnar Rätsch, Sebastian Mika, Bernhard Schölkopf, Klaus-Robert Müller, "Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1184-1199, Sept. 2002, doi:10.1109/TPAMI.2002.1033211
Usage of this product signifies your acceptance of the Terms of Use.