This Article 
 Bibliographic References 
 Add to: 
A Unified Model for Probabilistic Principal Surfaces
January 2001 (vol. 23 no. 1)
pp. 22-41

Abstract—Principal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of high-dimensional data not typically attainable by classical linear methods. Solutions to several problems, such as proof of existence and convergence, faced by the original principal curve formulation have been proposed in the past few years. Nevertheless, these solutions are not generally extensible to principal surfaces, the mere computation of which presents a formidable obstacle. Consequently, relatively few studies of principal surfaces are available. Recently, we proposed the probabilistic principal surface (PPS) to address a number of issues associated with current principal surface algorithms. PPS uses a manifold oriented covariance noise model, based on the generative topographical mapping (GTM), which can be viewed as a parametric formulation of Kohonen's self-organizing map. Building on the PPS, we introduce a unified covariance model that implements PPS $\left( 0<\alpha<1\right) $, GTM $\left( \alpha=1\right) $, and the manifold-aligned GTM $\left( \alpha>1\right) $ by varying the clamping parameter $\alpha$. Then, we comprehensively evaluate the empirical performance (reconstruction error) of PPS, GTM, and the manifold-aligned GTM on three popular benchmark data sets. It is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings. Convergence of the PPS is found to be identical to that of the GTM and the computational overhead incurred by the PPS decreases to $40$ percent or less for more complex manifolds. These results show that the generalized PPS provides a flexible and effective way of obtaining principal surfaces.

[1] J.H. Friedman, “An Overview of Predictive Learning and Function Approximation,” From Statistics to Neural Networks, Proc. NATO/ASI Workshop, V. Cherkassky, J.H. Friedman, and H. Wechsler, eds., pp. 1-61, 1994.
[2] R. Fisher, “The Case of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, no. part II, pp. 179-188, 1936.
[3] R.A. Johnson and D.W. Wichern,Applied multivariate statistical analysis, Prentice Hall, 1988.
[4] A. Hyvärinen, “Survey on Independent Component Analysis,” Neural Computing Surveys, vol. 2, pp. 94-128, 1999.
[5] J.H. Friedman, “Exploratory Projection Pursuit,” J. Am. Statistical Assoc., vol. 82, no. 397, pp. 249-266, Mar. 1987.
[6] L. Fahrmeir and G. Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models. New York: Springer-Verlag, 1994.
[7] G.W. Cottrell, P. Munroe, and D. Zipser, “Image Compression by Back Propagation: An Example of Extensional Programming,” Technical Report ICS 8702, Univ. of California at San Diego, 1987.
[8] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1995.
[9] T. Hastie and W. Stuetzle, “Principal Curves,” J. Am. Statistical Assoc., vol. 84, no. 406, pp. 502-516, June 1988.
[10] M. LeBlanc and R. Tibshirani, “Adaptive Principal Surfaces,” J. Am. Statistical Assoc., vol. 89, no. 425, pp. 53-64, Mar. 1994.
[11] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[12] M.E. Tipping and C.M. Bishop, “Mixtures of Probabilistic Principal Component Analysers,” Technical Report NCRG/97/003, Aston Univ., June 1997.
[13] C.M. Bishop and M.E. Tipping, “A Hierarchical Latent Variable Model for Data Visualization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 281-293, Mar. 1998.
[14] H. Attias, “Independent Factor Analysis,” Neural Computation, vol. 11, no. 4, pp. 803-851, May 1999.
[15] K.-y. Chang and J. Ghosh, “Probabilistic Principal Surfaces,” Proc. Int'l Joint Conf. Neural Networks, p. 605, July 1999.
[16] C.M. Bishop, M. Svensén, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-235, 1998.
[17] H. Ritter, T. Martinetz, and K. Schulten, Neural Computation and Self-Organizing Maps: An Introduction. Reading, Mass.: Addison-Wesley, 1992.
[18] F. Mulier and V. Cherkassky, “Self-Organization as an Iterative Kernel Smoothing Process,” Neural Computation, vol. 7, pp. 1,165-1,177, 1995.
[19] C. de Boor, A Practical Guide to Splines. New York: Springer-Verlag, 1978.
[20] W. Cleveland and S. Devlin, “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting,” J. Am. Statistical Assoc., vol. 83, pp. 596-610, 1988.
[21] T. Duchamp and W. Stuetzle, “Extremal Properties of Principal Curves in the Plane,” Annals of Statistics, vol. 24, no. 4, pp. 1511-1520, 1996.
[22] J.D. Banfield and A.E. Raftery, “Ice Floe Identification in Satellite Images Using Mathematical Morphology and Clustering about Principal Curves,” J. Am. Statistical Assoc., vol. 87, no. 417, pp. 7-16, Mar 1992.
[23] K.-y. Chang and J. Ghosh, “Principal Curves for Nonlinear Feature Extraction and Classification,” SPIE: Applications of Artificial Neural Networks in Image Processing III, vol. 3307, pp. 120-129, Jan. 1998.
[24] R. Tibshirani, “Principal Curves Revisited,” Statistics and Computing, vol. 2, pp. 183-190, 1992.
[25] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum-Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.
[26] C.F.J. Wu, “On the Convergence Properties of the EM Algorithm,” Annals of Statistics, vol. 11, pp. 95-103, 1983.
[27] B. Kégl, A. Krzyzak, T. Linder, and K. Zeger, “Learning and Design of Principal Curves,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 3, pp. 281-297, Mar. 2000.
[28] B. Kégl, A. Krzyzak, T. Linder, and K. Zeger, “Principal Curves: Learning and Convergence,” Proc. IEEE Int'l Symp. Information Theory, 1998.
[29] B. Kégl, A. Krzyzak, T. Linder, and K. Zeger, “A Polygonal Line Algorithm for Constructing Principal Curves,” Neural Information Processing Systems, vol. 11, pp. 501-507, 1998.
[30] P. Delicado, “Principal Curves and Principal Oriented Points,” Technical Report 309, Departament d'Economia i Empresa, Universitat Pompeu Fabra, 1998.
[31] P. Delicado, “Another Look at Principal Curves and Surfaces,” unpublished, 1999.
[32] T. Hastie, “Principal Curves and Surfaces,” PhD thesis, Stanford Univ., 1984.
[33] J.H. Friedman, “Multivariate Adaptive Regression Splines,” Annals of Statistics, vol. 19, pp. 1-141, 1991.
[34] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. John Wiley&Sons, 1973.
[35] E. Erwin, K. Obermayer, and K. Schulten, “Self-Organizing Maps: Ordering, Convergence Properties, and Energy Functions,” Biological Cybernetics, vol. 67, pp. 47-55, 1992.
[36] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation. Addison-Wesley, 1991.
[37] C.M. Bishop and M. Svensén, “GTM: The Generative Topographic Mapping,” Technical Report NCRG/96/015, Aston Univ., Apr. 1997.
[38] T.K. Moon, The Expectation-Maximization Algorithm in Signal Processing IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, 1996.
[39] K.-y. Chang, “Image and Signal Processing Using Neural Networks,” MS thesis, Dept. of Electrical Eng., Univ. of Hawaii at Ma noa, Dec. 1994.
[40] H. Bourland and Y. Kamp, “Auto-Association by Multilayer Perceptrons and Singular Value Decomposition,” Biological Cybernetics, vol. 59, pp. 291-294, 1988.
[41] M.A. Kramer, “Nonlinear Principal Component Analysis Using Autoassociative Neural Networks,” Am. Inst. Chemical Eng. J., vol. 37, no. 2, pp. 233-243, 1991.
[42] S. Tan and M.L. Mavrovouniotis, “Reducing Data Dimensionality through Optimizing Neural Network Inputs,” Am. Inst. Chemical Eng. J., vol. 41, no. 6, pp. 1471-1480, June 1995.
[43] D. Dong and T.J. McAvoy, “Nonlinear Principal Component Analysis-Based on Principal Curves and Neural Networks,” Computers and Chemical Eng., vol. 20, no. 1, pp. 65-78, 1996.
[44] J. Karhunen, E. Oja, L. Wang, R. Vigario, and J. Joutsensalo, “A Class of Neural Networks for Independent Component Analysis,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 486-504, 1997.
[45] E.C. Malthouse, “Limitations of Nonlinear PCA as Performed with Generic Neural Networks,” IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 165-173, Jan. 1998.
[46] C.M. Bishop, M. Svensén, and C.K.I. Williams, “Developments of the Generative Topographic Mapping,” Neurocomputing, vol. 21, pp. 203-224, 1998.
[47] M. Svensén, “GTM: The Generative Topographic Mapping,” PhD thesis, Aston Univ., Birmingham, UK, 1998,
[48] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” 1998.
[49] K.-y. Chang and J. Ghosh, “Three-Dimensional Model-Based Object Recognition and Pose Estimation Using Probabilistic Principal Surfaces,” SPIE: Applications of Artificial Neural Networks in Image Processing V, pp. 192-203, Jan. 2000.
[50] K.-y. Chang, “Nonlinear Dimensionality Reduction Using Probabilistic Principal Surfaces,” PhD thesis, Dept. of Electrical and Computer Eng., Univ. of Texas at Austin, May 2000.
[51] J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley&Sons, 1988.

Index Terms:
Principal curve, principal surface, probabilistic, dimensionality reduction, nonlinear manifold, generative topographic mapping.
Kui-yu Chang, Joydeep Ghosh, "A Unified Model for Probabilistic Principal Surfaces," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 1, pp. 22-41, Jan. 2001, doi:10.1109/34.899944
Usage of this product signifies your acceptance of the Terms of Use.