This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Online Kernel Principal Component Analysis: A Reduced-Order Model
Sept. 2012 (vol. 34 no. 9)
pp. 1814-1826
P. Honeine, Lab. de Modelisation et Surete des Syst., Univ. de Technol. de Troyes, Troyes, France
Kernel principal component analysis (kernel-PCA) is an elegant nonlinear extension of one of the most used data analysis and dimensionality reduction techniques, the principal component analysis. In this paper, we propose an online algorithm for kernel-PCA. To this end, we examine a kernel-based version of Oja's rule, initially put forward to extract a linear principal axe. As with most kernel-based machines, the model order equals the number of available observations. To provide an online scheme, we propose to control the model order. We discuss theoretical results, such as an upper bound on the error of approximating the principal functions with the reduced-order model. We derive a recursive algorithm to discover the first principal axis, and extend it to multiple axes. Experimental results demonstrate the effectiveness of the proposed approach, both on synthetic data set and on images of handwritten digits, with comparison to classical kernel-PCA and iterative kernel-PCA.

[1] I. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
[2] P.E. Gill, G.H. Golub, W. Murray, and M.A. Saunders, "Methods for Modifying Matrix Factorizations," Math. Computation, vol. 28, pp. 505-535, Apr. 1974.
[3] J.R. Bunch and C.P. Nielsen, "Updating the Singular Value Decomposition," Numerische Mathematik, vol. 31, pp. 111-129, 1978.
[4] P. Hall, D. Marshall, and R. Martin, "Merging and Splitting Eigenspace Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 9, pp. 1042-1049, Sept. 2000.
[5] E. Oja, "A Simplified Neuron Model as a Principal Component Analyzer," J. Math. Biology, vol. 15, pp. 267-273, 1982.
[6] E. Oja and J. Karhunen, "On Stochastic Approximation of the Eigenvectors and Eigenvalues of the Expectation of a Random Matrix," J. Math. Analysis and Applications, vol. 106, pp. 69-84, 1985.
[7] T.D. Sanger, "Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network," Neural Networks, vol. 2, pp. 459-473, 1989.
[8] T.D. Sanger, "Two Iterative Algorithms for Computing the Singular Value Decomposition from Input/Output Samples," Advances in Neural Information Processing Systems, J.D. Cowan, G. Tesauro, and J. Alspector, eds., vol. 6, pp. 144-151, 1993.
[9] N. Aronszajn, "Theory of Reproducing Kernels," Trans. Am. Math. Soc., vol. 68, pp. 337-404, 1950.
[10] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[11] M. Aizerman, E. Braverman, and L. Rozonoer, "Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning," Automation and Remote Control, vol. 25, pp. 821-837, 1964.
[12] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K. Müller, "Fisher Discriminant Analysis with Kernels," Advances in Neural Networks for Signal Processing, Y.H. Hu, J. Larsen, E. Wilson, and S. Douglas, eds., pp. 41-48, Morgan Kaufmann, 1999.
[13] R. Rosipal and L. Trejo, "Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space," J. Machine Learning Research, vol. 2, pp. 97-123, 2002.
[14] V. Guigue, A. Rakotomamonjy, and S. Canu, "Kernel Basis Pursuit," Proc. 16th European Conf. Machine Learning, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, eds., pp. 146-157, 2005.
[15] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[16] B. Schölkopf, A. Smola, and K. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[17] K. Kim, M. Franz, and B. Schölkopf, "Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis," Technical Report 109, Max-Planck-Institut für Biologische Kybernetik, Tübingen, Germany, 06, 2003.
[18] K. Kim, M. Franz, and B. Schölkopf, "Iterative Kernel Principal Component Analysis for Image Modeling," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 9, pp. 1351-1366, Sept. 2005.
[19] J. Kivinen, A.J. Smola, and R.C. Williamson, "Online Learning with Kernels," IEEE Trans. Signal Processing, vol. 52, no. 8, pp. 2165-2176, Aug. 2004.
[20] S. Smale and Y. Yao, "Online Learning Algorithms," Foundation Computational Math., vol. 6, no. 2, pp. 145-170, 2006.
[21] S.V. Vishwanathan, N.N. Schraudolph, and A.J. Smola, "Step Size Adaptation in Reproducing Kernel Hilbert Space," J. Machine Learning Research, vol. 7, pp. 1107-1133, 2006.
[22] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, "Online Passive-Aggressive Algorithms," J. Machine Learning Research, vol. 7, pp. 551-585, 2006.
[23] C. Richard, J.C.M. Bermudez, and P. Honeine, "Online Prediction of Time Series Data with Kernels," IEEE Trans. Signal Processing, vol. 57, no. 3, pp. 1058-1067, Mar. 2009.
[24] G. Kimeldorf and G. Wahba, "Some Results on Tchebycheffian Spline Functions," J. Math. Analysis and Applications, vol. 33, pp. 82-95, 1971.
[25] B. Schölkopf, R. Herbrich, and R. Williamson, "A Generalized Representer Theorem," Technical Report NC2-TR-2000-81, NeuroCOLT, Royal Holloway College, Univ. of London, 2000.
[26] Y. Engel, S. Mannor, and R. Meir, "The Kernel Recursive Least Squares Algorithm," IEEE Trans. Signal Processing, vol. 52, no. 8, pp. 2275-2285, Aug. 2004.
[27] L. Csató and M. Opper, "Sparse Representation for Gaussian Process Models," Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 444-450, 2001.
[28] M. Ouimet and Y. Bengio, "Greedy Spectral Embedding," Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, R.G. Cowell and Z. Ghahramani, eds., pp. 253-260, 2005.
[29] M.E. Tipping, "Sparse Kernel Principal Component Analysis," Advances in Neural Information Processing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp, eds., vol. 13, pp. 633-639, 2001.
[30] A. Smola, O. Mangasarian, and B. Schölkopf, "Sparse Kernel Feature Analysis," Technical Report 99-04, Univ. of Wisconsin, Data Mining Inst., Madison, 1999.
[31] Z.K. Gon, J. Feng, and C. Fyfe, "A Comparison of Sparse Kernel Principal Component Analysis Methods," Proc. Int'l Conf. Knowledge-Based Intelligent Eng. Systems and Allied Technologies, R.J. Howlett and L.C. Jain, eds., pp. 309-312, 2000.
[32] G.P. McCabe, "Principal Variables," Technometrics, vol. 26, pp. 137-144, May 1984.
[33] H. Zou, T. Hastie, and R. Tibshirani, "Sparse Principal Component Analysis," J. Computational & Graphical Statistics, vol. 15, pp. 265-286, June 2006.
[34] A. d'Aspremont, F.R. Bach, and L.E. Ghaoui, "Full Regularization Path for Sparse Principal Component Analysis," Proc. 24th Int'l Conf. Machine Learning, pp. 177-184, 2007.
[35] B. Schölkopf, S. Mika, C.J.C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A.J. Smola, "Input Space versus Feature Space in Kernel-Based Methods," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1000-1017, Sept. 1999.
[36] L. Csató and M. Opper, "Sparse Online Gaussian Processes," Neural Computation, vol. 14, pp. 641-668, 2002.
[37] M. Seeger, "Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations," PhD thesis, Inst. of Adaptive and Neural Computation, Univ. of Edinburgh, 2003.
[38] C. Burges, "Simplified Support Vector Decision Rules," Proc. 13th Int'l Conf. Machine Learning, pp. 71-77, 1996.
[39] M. Wu, B. Schölkopf, and G. Bakir, "A Direct Method for Building Sparse Kernel Learning Algorithms," J. Machine Learning Research, vol. 7, pp. 603-624, 2006.
[40] S. Agarwal, V.V. Saradhi, and H. Karnick, "Kernel-Based Online Machine Learning and Support Vector Reduction," Neurocomputing, vol. 71, nos. 7-9, pp. 1230-1237, 2008.
[41] T. Downs, K.E. Gates, and A. Masters, "Exact Simplification of Support Vector Solutions," J. Machine Learning Research, vol. 2, pp. 293-297, 2001.
[42] E. Parrado-Hernández, I. Mora-Jiménez, J. Arenas-García, A.R. Figueiras-Vidal, and A. Navia-Vázquez, "Growing Support Vector Classifiers with Controlled Complexity," Pattern Recognition, vol. 36, no. 7, pp. 1479-1488, 2003.
[43] S.S. Keerthi, O. Chapelle, and D. DeCoste, "Building Support Vector Machines with Reduced Classifier Complexity," J. Machine Learning Research, vol. 7, pp. 1493-1515, 2006.
[44] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[45] J.A. Tropp, A.C. Gilbert, S. Muthukrishnan, and M. Strauss, "Improved Sparse Approximation over Quasi-Incoherent Dictionaries," Proc. Int'l Conf. Image Processing, vol. 1, pp. 37-40, 2003.
[46] A.C. Gilbert, S. Muthukrishnan, and M.J. Strauss, "Approximation of Functions over Redundant Dictionaries Using Coherence," Proc. 14th ACM-SIAM Symp. Discrete Algorithms, pp. 243-252, 2003.
[47] J.A. Tropp, "Greed Is Good: Algorithmic Results for Sparse Approximation," IEEE Trans. Information Theory, vol. 50, no. 10, pp. 2231-2242, Oct. 2004.
[48] P. Honeine, C. Richard, and J.C.M. Bermudez, "On-Line Nonlinear Sparse Approximation of Functions," Proc. IEEE Int'l Symp. Information Theory, pp. 956-960, June 2007.
[49] T.D. Sanger, "Optimal Unsupervised Learning in Feedforward Neural Networks," technical report, MIT, 1989.
[50] L.-H. Chen and S. Chang, "An Adaptive Learning Algorithm for Principal Component Analysis," IEEE Trans. Neural Networks, vol. 6, no. 5, pp. 1255-1263, Sept. 1995.
[51] N.N. Schraudolph, S. Günter, and S.V.N. Vishwanathan, "Fast Iterative Kernel PCA," Advances in Neural Information Processing Systems, vol. 19, pp. 1225-1232, 2007.
[52] C. Darken, J. Chang, and J. Moody, "Learning Rate Schedules for Faster Stochastic Gradient Search," Proc. IEEE Workshop Neural Networks for Signal Processing, 1992.
[53] S. Günter, N.N. Schraudolph, and S.V.N. Vishwanathan, "Fast Iterative Kernel Principal Component Analysis," J. Machine Learning Research, vol. 8, pp. 1893-1918, Dec. 2007.
[54] J. Cadima and I. Jolliffe, "On Relationships between Uncentred and Column-Centred Principal Component Analysis," Pakistan J. Statistics, vol. 25, no. 4, pp. 473-503, 2009.
[55] P. Honeine and C. Richard, "Preimage Problem in Kernel-Based Machine Learning," IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 77-88, Mar. 2011.
[56] P. Honeine and C. Richard, "Solving the Pre-Image Problem in Kernel Machines: A Direct Method," Proc. 19th IEEE Workshop Machine Learning for Signal Processing, Sept. 2009.
[57] M. Kallas, P. Honeine, C. Richard, C. Francis, and H. Amoud, "Non-Negative Pre-Image in Machine Learning for Pattern Recognition," Proc. 19th European Conf. Signal Processing, Aug./Sept. 2011.
[58] Y. Lecun and C. Cortes, "The MNIST Database of Handwritten Digits," http://yann.lecun.com/exdbmnist/, 1998.

Index Terms:
reduced order systems,data analysis,function approximation,principal component analysis,iterative kernel-PCA,online kernel principal component analysis,reduced-order model,data analysis,dimensionality reduction techniques,online algorithm,Oja rule,linear principal axe extraction,kernel-based machines,principal function approximation,synthetic data set,handwritten digit image,classical kernel-PCA,Kernel,Principal component analysis,Eigenvalues and eigenfunctions,Dictionaries,Algorithm design and analysis,Data models,Training data,recursive algorithm.,Principal component analysis,online algorithm,machine learning,reproducing kernel,Oja's rule
Citation:
P. Honeine, "Online Kernel Principal Component Analysis: A Reduced-Order Model," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1814-1826, Sept. 2012, doi:10.1109/TPAMI.2011.270
Usage of this product signifies your acceptance of the Terms of Use.