
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Gunnar Rätsch, Sebastian Mika, Bernhard Schölkopf, KlausRobert Müller, "Constructing Boosting Algorithms from SVMs: An Application to OneClass Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 11841199, September, 2002.  
BibTex  x  
@article{ 10.1109/TPAMI.2002.1033211, author = {Gunnar Rätsch and Sebastian Mika and Bernhard Schölkopf and KlausRobert Müller}, title = {Constructing Boosting Algorithms from SVMs: An Application to OneClass Classification}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {24}, number = {9}, issn = {01628828}, year = {2002}, pages = {11841199}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2002.1033211}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Pattern Analysis and Machine Intelligence TI  Constructing Boosting Algorithms from SVMs: An Application to OneClass Classification IS  9 SN  01628828 SP1184 EP1199 EPD  11841199 A1  Gunnar Rätsch, A1  Sebastian Mika, A1  Bernhard Schölkopf, A1  KlausRobert Müller, PY  2002 KW  Boosting KW  SVMs KW  oneclass classification KW  unsupervised learning KW  novelty detection. VL  24 JA  IEEE Transactions on Pattern Analysis and Machine Intelligence ER   
Abstract—We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boostinglike algorithm and vice versa. We exemplify this translation procedure for a new algorithm—oneclass leveraging—starting from the oneclass support vector machine (1SVM). This is a first step toward unsupervised learning in a boosting framework. Building on socalled barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on oneclass classification problems demonstrate the usefulness of our approach.
[1] H. Drucker, R. Schapire, and P.Y. Simard, “Boosting Performance in Neural Networks,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 7, pp. 705719, 1993.
[2] Y.A. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Müller, E. Säckinger, P.Y. Simard, and V.N. Vapnik, “Comparison of Learning Algorithms for Handwritten Digit Recognition,” Proc. Int'l Conf. Artificial Neural Networks '95, F. FogelmanSouliéand P. Gallinari, eds., vol. II, pp. 5360, 1995.
[3] R. Maclin and D. Opitz, “An Empirical Evaluation of Bagging and Boosting,” Proc. 14th Nat'l Conf. Artifical Intelligence, pp. 546551, 1997.
[4] H. Schwenk and Y. Bengio, “AdaBoosting Neural Networks,” Proc. Int'l Conf. Artificial Neural Networks '97, W. Gerstner, A. Germond, M. Hasler, and J.D. Nicoud, eds., vol. 1327, pp. 967972, 1997.
[5] E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithm: Bagging, Boosting and Variants,” Machine Learning, vol. 36, pp. 105142, 1999.
[6] T.G. Dietterich, “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization,” Machine Learning, vol. 40, no. 2, 1999.
[7] N. Duffy and D.P. Helmbold, “Leveraging for Regression,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 208219, 2000.
[8] G. Rätsch,M. Warmuth, S. Mika, T. Onoda, S. Lemm, and K.R. Müller, “Barrier Boosting,” Proc. 13th Ann. Conf. Computer Learning Theory, pp. 170179, 2000.
[9] A. Demiriz, K.P. Bennett, and J. ShaweTaylor, “Linear Programming Boosting via Column Generation,” J. Machine Learning Research, 2001 (special issue on support vector machines and kernel methods).
[10] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan College Press, New York, 1994.
[11] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[12] “Neural Networks: Tricks of the Trade,” Lecture Notes in Computer Science, G. Orr, and K.R.Müller, eds., vol. 1524, 1998.
[13] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. Fifth Ann. Workshop Computational Learning Theory, ACM Press, New York, 1992, pp. 144152.
[14] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[15] B. Schölkopf, Support Vector Learning. Munich: Oldenbourg Verlag, 1997.
[16] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[17] N. Cristianini and J. ShaweTaylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[18] B. Schölkopf and A.J. Smola, Learning with Kernels. Cambridge, Mass.: MIT Press, 2002.
[19] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 147, 1998.
[20] K.R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An Introduction to KernelBased Learning Algorithms,” IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181201, 2001.
[21] D.G. Luenberger, Linear and Nonlinear Programming, second ed. Reading, Mass.: AddisonWesley, May 1984.
[22] B. Schölkopf, J. Platt, J. ShaweTaylor, A.J. Smola, and R.C. Williamson, “Estimating the Support of a HighDimensional Distribution,” Neural Computation, vol. 13, no. 7, pp. 14431471, 2001.
[23] D. Tax and R. Duin, “Data Domain Description bySupport Vectors,” Proc. European Symp. Artificial Neural Network, M. Verleysen, ed., pp. 251256, 1999.
[24] C. Campbell and K.P. Bennett, “A Linear Programming Approach to Novelty Detection,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 395401, 2001.
[25] R.E. Schapire, Y. Freund, P.L. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, vol. 26, no. 5, pp. 16511686, Oct. 1998.
[26] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” J. Japanese Soc. Artificial Intelligence, vol. 14, no. 5, pp. 771780, Sept. 1999. (Appeared in Japanese, translation by Naoki Abe.)
[27] R.E. Schapire and Y. Singer, “Improved Boosting Algorithms Using ConfidenceRated Predictions,” Proc. Ann. Conf. Computer Learning Theory '98, pp. 8091, 1998.
[28] O.L. Mangasarian, “ArbitraryNorm Separating Plane,” Operation Research Letters, vol. 24, no. 1, pp. 1523, 1999.
[29] L.G. Valiant, “A Theory of the Learnable,” Comm. ACM, vol. 27, no. 11, pp. 11341142, Nov. 1984.
[30] R.E. Schapire, “The Design and Analysis of Efficient Learning Algorithms,” PhD thesis, MIT Press, 1992.
[31] Y. Freund and R.E. Schapire, “A DecisionTheoretic Generalization of OnLine Learning and an Application to Boosting,” Proc. EuroCOLT: European Conf. Computational Learning Theory, 1994.
[32] L. Breiman, “Prediction Games and Arcing Algorithms,” Neural Computation, vol. 11, no. 7, pp. 14931518, 1999. (Also Technical Report 504, Statistics Dept., Univ. of Calif., Berkeley.)
[33] G. Rätsch and M.K. Warmuth, “Marginal Boosting,” NeuroCOLT2 Technical Report 97, Royal Holloway College, London, July 2001. (extended version accepted for COLT '02).
[34] Y. Freund and R. Schapire, “Game Theory, OnLine Prediction and Boosting,” Proc. Ann. Conf. Computer Learning Theory, pp. 325332, 1996.
[35] G. Rätsch, “Robust Boosting via Convex Optimization,” PhD thesis, Univ. of Potsdam, 2001. http://www.comp.hkbu.edu.hk/william/pub/ thesis.ps.gzhttp://mlg.anu.edu.au/~raetsch thesis.ps.gz.
[36] K.P. Bennett and O.L. Mangasarian, “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets,” Optimization Methods and Software, vol. 1, pp. 2334, 1992.
[37] S. Chen, D. Donoho, and M. Saunders, “Atomic Decomposition by Basis Pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 3361, 1999.
[38] P. Bradley, O. Mangasarian, and J. Rosen, “Parsimonious Least Norm Approximation,” Computational Optimization and Applications, vol. 11, no. 1, pp. 521, 1998.
[39] G. Rätsch,T. Onoda, and K.R. Müller, “Soft Margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287320, Mar. 2001. (Also NeuroCOLT Technical Report NCTR1998021.)
[40] O.L. Mangasarian, “Mathematical Programming in Data Mining,” Data Mining and Knowledge Discovery, vol. 42, no. 1, pp. 183201, 1997.
[41] B. Schölkopf, A. Smola, and K.R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, Vol. 10, 1998, pp. 12991319.
[42] G. Rätsch, B. Schölkopf, A.J. Smola, S. Mika, T. Onoda, and K.R. Müller, “Robust Ensemble Learning,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 207219, Cambridge, Mass.: MIT Press, 2000.
[43] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, no. 2, pp. 197227, 1990.
[44] N. Duffy and D.P. Helmbold, “A Geometric Approach to Leveraging Weak Learners,” Proc. Computational Learning Theory: Fourth European Conf. (EuroCOLT '99), P. Fischer and H.U. Simon, eds., pp. 1833, Mar. 1999.
[45] A.J. Grove and D. Schuurmans, “Boosting in the Limit: Maximizing the Margin of Learned Ensembles,” Proc. 15th Nat'l Conf. Artifical Intelligence, 1998.
[46] J. Kivinen and M. Warmuth, “Boosting as Entropy Projection,” Proc. 12th Ann. Conf. Computer Learning Theory, pp. 134144, 1999.
[47] J. Friedman, T. Hastie, and R.J. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” Annals of Statistics, vol. 2, pp. 337374, 2000. (with discussion, pp. 375407, also technical report, Dept. of Statistics, Sequoia Hall, Stanford Univ.)
[48] L. Mason, J. Baxter, P.L. Bartlett, and M. Frean, “Functional Gradient Techniques for Combining Hypotheses,” Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans, eds., pp. 221247, Cambridge, Mass.: MIT Press, 2000.
[49] G. Rätsch, S. Mika, and M.K. Warmuth, “On the Convergence of Leveraging,” NeuroCOLT2 Technical Report 98, Royal Holloway College, London, Aug. 2001. (a shorter version accepted for NIPS '01).
[50] T. Zhang, “A General Greedy Approximation Algorithm with Applications,” Advances in Neural Information Processing Systems, vol. 14,MIT Press, 2002 (in press).
[51] L.M. Bregman, “The Relaxation Method for Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Math. and Math. Physics, vol. 7, pp. 200127, 1967.
[52] Y. Censor and S.A. Zenios, “Parallel Optimization: Theory, Algorithms and Application,” Numerical Math. and Scientific Computation, Oxford Univ. Press, 1997.
[53] M. Collins, R.E. Schapire, and Y. Singer, “Logistic Regression, Adaboost and Bregman Distances,” Proc. Ann. Conf. Computer Learning Theory, pp. 158169, 2000.
[54] S. Nash and A. Sofer, Linear and Nonlinear Programming. New York: McGrawHill, 1996.
[55] G. Rätsch, A. Demiriz, and K. Bennett, “Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces,” Machine Learning, vol. 48, nos. 13, pp. 193221, 2002. (Also NeuroCOLT2 Technical Report NCTR2000085.)
[56] R. Cominetti and J.P. Dussault, “A Stable Exponential Penalty Algorithm with Superlinear Convergence,” J. Application Theory and Optimization, vol. 83, no. 2, Nov. 1994.
[57] M. Doljansky and M. Teboulle, “An Interior Proximal Algorithm and the Exponential Multiplier Method for Semidefinite Programming,” SIAM J. Optimization, vol. 9, no. 1, pp. 113, 1998.
[58] L. Mosheyev and M. Zibulevsky, “Penalty/Barrier Multiplier Algorithm for Semidefinite Programming,” Optimization Methods and Software, 1999.
[59] Z.Q. Luo and P. Tseng, “On the Convergence of Coordinate Descent Method for Convex Differentiable Minimization,” J. Optimization Theory and Applications, vol. 72, no. 1, pp. 735, 1992.
[60] S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing Features of Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380393, Apr. 1997.
[61] T. Poggio and F. Girosi, “Extensions of a Theory of Networks for Approximation and Learning: Dimensionality Reduction and Clustering,” Technical Report AIM1167, MITAI Lab, Mar. 1990.
[62] A.J. Smola, “Learning with Kernels,” PhD thesis, Technische Universität Berlin, 1998.
[63] S.L. Salzberg, “A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA,” Computational Applied Bioscience, vol. 13, no. 4, pp. 365376, 1997.
[64] S. Sonnenburg, G. Rätsch, A. Jagota, and K. R. Müller, “SpliceSite Recognition with Support Vector Machines,” to be published in Proc. Int'l Conf. Artificial Neural Networks '02, 2002.
[65] P. Hayton, B. Schölkopf, L. Tarassenko, and P. Anuzis, “Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra,” Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 946952, 2001.