This Article 
 Bibliographic References 
 Add to: 
Learning and Design of Principal Curves
March 2000 (vol. 22 no. 3)
pp. 281-297

Abstract—Principal curves have been defined as “self-consistent” smooth curves which pass through the “middle” of a d-dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition makes it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction. Our theoretical learning scheme chooses a curve from a class of polygonal lines with $k$ segments and with a given total length to minimize the average squared distance over $n$ training points drawn independently. Convergence properties of this learning scheme are analyzed and a practical version of this theoretical algorithm is implemented. In each iteration of the algorithm, a new vertex is added to the polygonal line and the positions of the vertices are updated so that they minimize a penalized squared distance criterion. Simulation results demonstrate that the new algorithm compares favorably with previous methods, both in terms of performance and computational complexity, and is more robust to varying data models.

[1] T. Hastie, “Principal Curves and Surfaces,” PhD thesis, Stanford Univ., 1984.
[2] T. Hastie and W. Stuetzle, “Principal Curves,” J. Am. Statistical Assoc., vol. 84, pp. 502-516, 1989.
[3] Y. Linde, A. Buzo, R.M. Gray, An Algorithm for Vector Quantizer Design IEEE Trans. Comm., vol. 28, no. 1, pp. 84-95, 1980.
[4] W.S. Cleveland, “Robust Locally Weighted Regression and Smoothing Scatterplots,” J. Am. Statistical Assoc., vol. 74, pp. 829-835, 1979.
[5] J.D. Banfield and A.E. Raftery, “Ice Floe Identification in Satellite Images Using Mathematical Morphology and Clustering about Principal Curves,” J. Am. Statistical Assoc., vol. 87, pp. 7-16, 1992.
[6] R. Singh, M.C. Wade, and N.P. Papanikolopoulos, “Letter-Level Shape Description by Skeletonization in Faded Documents,” Proc. Fourth IEEE Workshop Applications of Computer Vision, pp. 121-126, 1998.
[7] K. Reinhard and M. Niranjan, “Subspace Models for Speech Transitions Using Principal Curves,” Proc. Inst. of Acoustics, vol. 20, no. 6, pp. 53-60, 1998.
[8] K. Chang and J. Ghosh, “Principal Curves for Nonlinear Feature Extraction and Classification,” Applications of Artificial Neural Networks in Image Processing III, vol. 3307, pp. 120-129, 1998.
[9] K. Chang and J. Ghosh, “Principal Curve Classifier—A Nonlinear Approach to Pattern Classification,” Proc. IEEE Int'l Joint Conf. Neural Networks, pp. 695-700, 1998.
[10] R. Tibshirani, “Principal Curves Revisited,” Statistics and Computation, vol. 2, pp. 183-190, 1992.
[11] F. Mulier and V. Cherkassky, “Self-Organization as an Iterative Kernel Smoothing Process,” Neural Computation, vol. 7, pp. 1,165-1,177, 1995.
[12] P. Delicado, “Principal Curves and Principal Oriented Points,” Technical Report 309, Dept. d'Economia i Empresa, Universitat Pompeu Fabra, 1998.
[13] T. Duchamp and W. Stuetzle, “Geometric Properties of Principal Curves in the Plane,” Robust Statistics, Data Analysis, and Computer Intensive Methods: In Honor of Peter Huber's 60th Birthday, H. Rieder, ed., vol. 109, pp. 135-152, Springer-Verlag, 1996.
[14] A.N. Kolmogorov and S.V. Fomin, Introductory Real Analysis. New York: Dover, 1975.
[15] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression. Boston: Kluwer Academic, 1992.
[16] J.A Hartigan,Clustering Algorithms, John Wiley and Sons, New York, N.Y., 1975.
[17] T. Tarpey, L. Li, and B.D. Flury, “Principal Points and Self-Consistent Points of Elliptical Distributions,” Annals of Statistics, vol. 23, no. 1, pp. 103-112, 1995.
[18] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[19] B. Kégl, “Principal Curves: Learning, Design, and Applications,” PhD thesis, Concorida Univ., Montreal, Canada, 1999.
[20] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag, 1996.
[21] A.N. Kolmogorov and V.M. Tikhomirov, “ε-Entropy andε-Capacity of Sets in Function Spaces,” Translations Am. Math. Soc., vol. 17, pp. 277-364, 1961.
[22] P. Grother, NIST Special Database 19. Nat'l Inst. of Standards and Technology, Advanced Systems Division, 1995.
[23] S. Suzuki and K. Abe, “Sequential Thinning of Binary Pictures Using Distance Transformation,” Proc. Eighth Int'l Conf. Pattern Recognition, pp. 289-292, 1986.
[24] R.B. Ash, Real Analysis and Probability. New York: Academic Press, 1972.
[25] W. Hoeffding, “Probability Inequalities for Sums of Bounded Random Variables,” J. Am. Statistical Assoc., vol. 58, pp. 13-30, 1963.

Index Terms:
Learning systems, unsupervised learning, feature extraction, vector quantization, curve fitting, piecewise linear approximation.
Balázs Kégl, Adam Krzyzak, Tamás Linder, Kenneth Zeger, "Learning and Design of Principal Curves," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 3, pp. 281-297, March 2000, doi:10.1109/34.841759
Usage of this product signifies your acceptance of the Terms of Use.