This Article 
 Bibliographic References 
 Add to: 
Principal Surfaces from Unsupervised Kernel Regression
September 2005 (vol. 27 no. 9)
pp. 1379-1391
We propose a nonparametric approach to learning of principal surfaces based on an unsupervised formulation of the Nadaraya-Watson kernel regression estimator. As compared with previous approaches to principal curves and surfaces, the new method offers several advantages: First, it provides a practical solution to the model selection problem because all parameters can be estimated by leave-one-out cross-validation without additional computational cost. In addition, our approach allows for a convenient incorporation of nonlinear spectral methods for parameter initialization, beyond classical initializations based on linear PCA. Furthermore, it shows a simple way to fit principal surfaces in general feature spaces, beyond the usual data space setup. The experimental results illustrate these convenient features on simulated and real data.

[1] P. Meinicke, “Unsupervised Learning in a Generalized Regression Framework,” PhD dissertation, Universität Bielefeld, 2000.
[2] T. Hastie, “Principal Curves and Surfaces,” PhD dissertation, Stanford Univ., 1984.
[3] B. Kégl, A. Krzyzak, T. Linder, and K. Zeger, “Learning and Design of Principal Curves,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 3, pp. 281-297, Mar. 2000.
[4] M. LeBlanc and R. Tibshirani, “Adaptive Principal Surfaces,” J. Am. Statistical Assoc., vol. 89, pp. 53-64, 1994.
[5] A.J. Smola, S. Mika, B. Schölkopf, and R.C. Williamson, “Regularized Principal Manifolds,” J. Machine Learning Research, vol. 1, pp. 179-209, 2001.
[6] K. Chang and J. Ghosh, “A Unified Model for Probabilistic Principal Surfaces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 1, pp. 22-41, Jan. 2001.
[7] C.M. Bishop, M. Svensen, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-234, 1998.
[8] T. Kohonen, Self-Organizing Maps. Springer, 1995.
[9] H. Ritter, T. Martinetz, and K. Schulten, Neural Computation and Self-Organizing Maps. Addison Wesley, 1992.
[10] J. Walter and H. Ritter, “Rapid Learning with Parametrized Self-Organizing Maps,” Neurocomputing, vol. 12, pp. 131-153, 1996.
[11] C.M. Bishop, M. Svensén, and C.K.I. Williams, “Developments of the Generative Topographic Mapping,” Neurocomputing, vol. 21, pp. 203-224, 1998.
[12] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[13] E.A. Nadaraya, “On Estimating Regression,” Theory of Probability and Its Application, vol. 10, pp. 186-190, 1964.
[14] G. Watson, “Smooth Regression Analysis,” Sankhya Series A, vol. 26, pp. 359-372, 1964.
[15] D.W. Scott, Multivariate Density Estimation. Wiley, 1992.
[16] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning. Springer-Verlag, 2001.
[17] S. Sandilya and S.R. Kulkarni, “Principal Curves with Bounded Turn,” IEEE Trans. Information Theory, vol. 48, pp. 2789-2793, 2002.
[18] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[19] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, 2000.
[20] M. Belkin and P. Niyogi, “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,” Neural Computation, vol. 15, no. 6, pp. 1373-1396, June 2003.
[21] H. Reiner, Handbook of Global Optimization. Dordrecht: Kluwer Academic Publishers, 1995.
[22] G.H. Bakir, J. Weston, and B. Schölkopf, “Learning to Find Pre-Images,” Advances in Neural Information Processing Systems, 2003.
[23] J.T. Kwok and I.W. Tsang, “Finding the Pre Images in Kernel Principal Component Analysis,” Proc. Sixth Ann. Workshop Kernel Machines, 2002.
[24] J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik, “Kernel Dependency Estimation,” Advances in Neural Information Processing Systems 15, 2003.
[25] B. Silverman, Density Estimation for Statistics and Data Analysis. London-New York: Chapman and Hall, 1986.
[26] J.C. Lagarias, J.A. Reeds, M.H. Wright, and P.E. Wright, “Convergence Properties of the Nelder-Mead Simplex Algorithm in Low Dimensions,” SIAM J. Optimization, vol. 9, pp. 112-147, 1998.
[27] M. Riedmiller and H. Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm,” Proc. IEEE Int'l Conf. Neural Networks, pp. 586-591, 1993.
[28] J.W. Sammon, Jr., “A Non-Linear Mapping for Data Structure Analysis,” IEEE Trans. Computers, vol. 18, pp. 401-409, 1969.
[29] B. Schölkopf, A.J. Smola, and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.
[30] Y. Bengio, J.-F.P.O. Delalleau, P. Vincent, and M. Ouimet, “Learning Eigenfunctions Links Spectral Embedding and Kernel PCA,” Neural Computation, vol. 16, pp. 2197-2219, 2004.

Index Terms:
Index Terms- Dimensionality reduction, principal curves, principal surfaces, density estimation, model selection, kernel methods.
Peter Meinicke, Stefan Klanke, Roland Memisevic, Helge Ritter, "Principal Surfaces from Unsupervised Kernel Regression," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 9, pp. 1379-1391, Sept. 2005, doi:10.1109/TPAMI.2005.183
Usage of this product signifies your acceptance of the Terms of Use.