This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Least-Squares Framework for Component Analysis
June 2012 (vol. 34 no. 6)
pp. 1041-1055
F. De la Torre, Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
Over the last century, Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), Locality Preserving Projections (LPP), and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. CA techniques are appealing because many can be formulated as eigen-problems, offering great potential for learning linear and nonlinear representations of data in closed-form. However, the eigen-formulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigen-problems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified least-squares framework to formulate many CA methods. We show how PCA, LDA, CCA, LPP, SC, and its kernel and regularized extensions correspond to a particular instance of least-squares weighted kernel reduced rank regression (LS--WKRRR). The LS-WKRRR formulation of CA methods has several benefits: 1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; 2) yields efficient numerical schemes to solve CA techniques; 3) overcomes the small sample size problem; 4) provides a framework to easily extend CA methods. We derive weighted generalizations of PCA, LDA, SC, and CCA, and several new CA techniques.

[1] I.T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
[2] K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space," The London, Edinburgh and Dublin Philosophical Magazine and J., vol. 6, pp. 559-572, 1901.
[3] H. Hotelling, "Analysis of a Complex of Statistical Variables into Principal Components," J. Educational Psychology, vol. 24, no. 6, pp. 417-441, 1933.
[4] R.A. Fisher, "The Statistical Utilization of Multiple Measurements," Annals of Eugenics, vol. 8, pp. 376-386, 1938.
[5] R.A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, vol. 7, pp. 179-188, 1936.
[6] H. Hotelling, "Relations between Two Sets of Variates," Biometrika, vol. 28, pp. 321-377, 1936.
[7] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[8] X. He and P. Niyogi, "Locality Preserving Projections," Proc. Neural Information Processing Systems, 2003.
[9] B. Mohar, "Some Applications of Laplace Eigenvalues of Graphs," Graph Symmetry: Algebraic Methods and Applications, pp. 225-275. Springer, 1997.
[10] F. De la Torre and T. Kanade, "Discriminative Cluster Analysis," Proc. Int'l Conf. Machine Learning, 2006.
[11] F. De la Torre, "A Least-Squares Unified View of PCA, LDA, CCA and Spectral Graph Methods," Technical Report CMU-RI-TR-08-29, Robotics Inst., Carnegie Mellon Univ., May 2008.
[12] M. Borga, "Learning Multidimensional Signal Processing," PhD dissertation, Linköping Univ., Sweden, 1998.
[13] S. Roweis and Z. Ghahramani, "A Unifying Review of Linear Gaussian Models," Neural Computation, vol. 11, no. 2, pp. 305-345, 1999.
[14] S. Yan, D. Xu, B. Zhang, and H. Zhang, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007.
[15] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[16] T.W. Anderson, "Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions," Annals of Math. Statistics, vol. 12, pp. 327-351, 1951.
[17] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, second ed. Wiley, 1984.
[18] S.S. Haykin, Adaptive Filter Theory. Prentice-Hall, 1996.
[19] L. Scharf, "The SVD and Reduced Rank Signal Processing," Signal Processing, vol. 25, no. 2, pp. 113-133, 2002.
[20] K.I. Diamantaras, Principal Component Neural Networks (Theory and Applications). John Wiley & Sons, 1996.
[21] F. De la Torre and M.J. Black, "Dynamic Coupled Component Analysis," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[22] P. Baldi and K. Hornik, "Neural Networks and Principal Component Analysis: Learning from Examples without Local Minima," Neural Networks, vol. 2, pp. 53-58, 1989.
[23] H. Murase and S.K. Nayar, "Visual Learning and Recognition of 3D Objects from Appearance," Int'l J. Computer Vision, vol. 1, no. 14, pp. 5-24, 1995.
[24] K.J. Bathe and E. Wilson, Numerical Methods in Finite Element Analysis. Prentice-Hall, 2011.
[25] A. Buchanan and A. Fitzgibbon, "Damped Newton Algorithms for Matrix Factorization with Missing Data," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[26] F. De la Torre and M.J. Black, "A Framework for Robust Subspace Learning," Int'l J. Computer Vision, vol. 54, pp. 117-142, 2003.
[27] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons, 1987.
[28] A. Blake and A. Zisserman, Visual Reconstruction. MIT Press Series, 1987.
[29] E. Oja, "A Simplified Neuron Model as Principal Component Analyzer," J. Math. Biology, vol. 15, pp. 267-273, 1982.
[30] S. Roweis, "EM Algorithms for PCA and SPCA," Proc. Neural Information Processing Systems, 1997.
[31] K.R. Gabriel and S. Zamir, "Lower Rank Approximation of Matrices by Least Squares with Any Choice of Weights," Technometrics, vol. 21, pp. 489-498, 1979.
[32] H. Shum, K. Ikeuchi, and R. Reddy, "Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 9, pp. 855-867, Sept. 1995.
[33] M. Tipping and C.M. Bishop, "Probabilistic Principal Component Analysis," J. Royal Statistical Soc. B, vol. 61, pp. 611-622, 1999.
[34] B. Schölkopf, A.J. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[35] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press Series, 2002.
[36] M. Irani and P. Anandan, "Factorization with Uncertainty," Proc. European Conf. Computer Vision, 2000.
[37] M.J. Greenacre, Theory and Applications of Correspondence Analysis. Academic Press, 1984.
[38] I. Tsang and J. Kwok, "Distance Metric Learning with Kernels," Proc. Int'l Conf. Artificial Neural Networks, 2003.
[39] R. Hartley and F. Schaffalitzky, "Powerfactorization: An Approach to Affine Reconstruction with Missing and Uncertain Data," Proc. Australia-Japan Advance Workshop Computer Vision, 2003.
[40] D. Skocaj and A. Leonardis, "Weighted and Robust Incremental Method for Subspace Learning," Proc. Int'l Conf. Computer Vision, 2003.
[41] P. Aguiar, M. Stosic, and J. Xavier, "Spectrally Optimal Factorization of Incomplete Matrices," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[42] C. Rao, "The Utilization of Multiple Measurements in Problems of Biological Classification," J. Royal Statistical Soc.-Series B, vol. 10, no. 2, pp. 159-203, 1948.
[43] A. Martinez and A. Kak, "PCA versus LDA," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001.
[44] P. Belhumeur, J. Hespanha, and D. Kriegman, "Eigenfaces versus Fisherfaces: Recognition Using Class Specific Linear Projection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[45] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.
[46] J. Ye, "Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems," J. Machine Learning Research, vol. 6, no. 1, pp. 483-502, Sept. 2005.
[47] S. Zhang and T. Sim, "Discriminant Subspace Analysis: A Fukunaga-Koontz Approach," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1732-1745, Oct. 2007.
[48] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. John Wiley and Sons, Inc., 2001.
[49] P. Gallinari, S. Thiria, F. Badran, and F. Fogelman-Soulie, "On the Relations between Discriminant Analysis and Multilayer Perceptrons," Neural Networks, vol. 4, pp. 349-360, 1991.
[50] J. Ye, "Least Squares Linear Discriminant Analysis," Proc. Int'l Conf. Machine Learning, 2007.
[51] S. Mika, "Kernel Fisher Discriminants," PhD thesis, Univ. of Technology, Berlin, 2002.
[52] K. Mardia, J. Kent, and J. Bibby, Multivariate Analysis. Academic Press, 1979.
[53] V.J. Yohai and M.S. Garcia, "Canonical Variables as Optimal Predictors," The Annals of Statistics, vol. 8, no. 4, pp. 865-869, 1980.
[54] M. Tso, "Reduced-Rank Regression and Canonical Analysis," J. Royal Statistical Soc., Series B, vol. 43, no. 2, pp. 183-189, 1981.
[55] M. Loog, R. Duin, and R. Hacb-Umbach, "Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 7, no. 23, pp. 762-766, July 2001.
[56] J.B. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 5500, no. 290, pp. 2319-2323, 2000.
[57] S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[58] J. Ham, D. Lee, S. Mika, and B. Schölkopf, "A Kernel View of the Dimensionality Reduction of Manifolds," Proc. Int'l Conf. Machine Learning, 2004.
[59] Y. Bengio, P. Vincent, J. Paiement, P. Vincent, and M. Ouimet, "Learning Eigenfunctions Links Spectral Embedding and Kernel PCA," Neural Computation, vol. 16, pp. 2197-2219, 2004.
[60] X. He, D. Cai, S. Yan, and H. Zhang, "Neighborhood Preserving Embedding," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[61] J.B. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[62] A.K. Jain, Algorithms for Clustering Data. Prentice Hall, 1988.
[63] H. Zha, C. Ding, M. Gu, X. He, and H. Simon, "Spectral Relaxation for K-Means Clustering," Proc. Neural Information Processing Systems, 2001.
[64] C. Ding and X. He, "$k$ -Means Clustering via Principal Component Analysis," Proc. Int'l Conf. Machine Learning, 2004.
[65] R. Zass and A. Shashua, "A Unifying Approach to Hard and Probabilistic Clustering," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[66] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[67] A.Y. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Proc. Neural Information Processing Systems, 2002.
[68] S. Yu and J. Shi, "Multiclass Spectral Clustering," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[69] F.K. Chung, Spectral Graph Theory. Providence: CBMS Regional Conference Series in Mathematics, vol. 92, Am. Math. Soc., 1997.
[70] L. Hagen and A. Kahng, "New Spectral Methods for Ratio Cut Partitioning and Clustering," IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, vol. 11, no. 9, pp. 1074-1085, Sept. 1992.
[71] D. Verma and M. Meila, "Comparison of Spectral Clustering Methods," Proc. Neural Information Processing Systems, 2003.
[72] R. Zass and A. Shashua, "Doubly Stochastic Normalization for Spectral Clustering," Proc. Neural Information Processing Systems, 2006.
[73] D. Tolliver, "Spectral Rounding and Image Segmentation," Techncial Report CMU-RI-TR-06-44, Robotics Inst., Carnegie Mellon Univ., Aug. 2006.
[74] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A Survey of Kernel and Spectral Methods for Clustering," Pattern Recognition, vol. 41, no. 1, pp. 176-190, 2008.
[75] I.S. Dhillon, Y. Guan, and B. Kulis, "Weighted Graph Cuts without Eigenvectors: A Multilevel Approach," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 1944-1957, Nov. 2007.
[76] Y. Weiss, "Segmentation Using Eigenvectors: A Unifying View," Proc. IEEE Int'l Conf. Computer Vision, 1999.
[77] A. Rahimi and B. Recht, "Clustering with Normalized Cuts Is Clustering with a Hyperplane," Proc. Workshop Statistical Learning in Computer Vision, 2004.
[78] C. Ding and T. Li, "Adaptive Dimension Reduction Using Discriminant Analysis and K-Means Clustering," Proc. Int'l Conf. Machine Learning, 2007.
[79] F. Bach and Z. Harchaoui, "Diffrac: A Discriminative and Flexible Framework for Clustering," Proc. Neural Information Processing Systems, 2007.
[80] J. Ye, Z. Zhao, and M. Wu, "Discriminative K-Means for Clustering," Proc. Neural Information Processing Systems, 2007.
[81] C. Ding, X. He, and H. Simon, "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering," Proc. SIAM Int'l Conf. Data Mining, 2005.
[82] D. Lee and H. Seung, "Algorithms for Non-Negative Matrix Factorization," Proc. Neural Information Processing Systems, 2000.
[83] W.H. Lawton and E.A. Sylvestre, "Self Modeling Curve Resolution," Technometrics, vol. 13, no. 3, pp. 617-633, 1971.
[84] P. Paatero and U. Tapper, "Positive Matrix Factorization: A Non-Negative Factor Model with Optimal Utilization of Error Estimates of Data Values," Envirometrics, vol. 5, pp. 111-126, 1994.
[85] J. Shena and G. Israela, "A Receptor Model Using a Specific Non-Negative Transformation Technique for Ambient Aerosol," Atmospheric Environment, vol. 23, no. 10, pp. 2289-2298, 1989.
[86] B. Moghaddam and A. Pentland, "Probabilistic Visual Learning for Object Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 137-143, July 1997.
[87] F. Zhou, F. De la Torre, and J.K. Hodgins, "Aligned Cluster Analysis for Temporal Segmentation of Human Motion," Proc. Eighth IEEE Int'l Conf. Automatic Face and Gesture Recognition, 2008.
[88] H. Shimodaira, K.-I. Noma, M. Nakai, and S. Sagayama, "Dynamic Time-Alignment Kernel in Support Vector Machine," Proc. Neural Information Processing Systems, 2001.
[89] F. Zhou and F. De la Torre, "Canonical Time Warping," Proc. Neural Information Processing Systems, 2009.
[90] H. Bischof, H. Wildenauer, and A. Leonardis, "Illumination Insensitive Recognition Using Eigenspaces," Computer Vision and Image Understanding, vol. 1, no. 95, pp. 86-104, 2004.
[91] W.T. Freeman and E.H. Adelson, "The Design and Use of Steerable Filters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891-906, Sept. 1991.
[92] F. De la Torre, A. Collet, J. Cohn, and T. Kanade, "Filtered Component Analysis to Increase Robustness to Local Minima in Appearance Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[93] Y. LeCun and Y. Bengio, "Convolutional Networks for Images Speech, and Time-Series," The Handbook of Brain Theory and Neural Networks, M.A. Arbib, ed., MIT Press, 1995.
[94] B.J. Frey and N. Jojic, "Transformation-Invariant Clustering Using the EM Algorithm," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 1-17, Jan. 2003.
[95] F. De la Torre and M.J. Black, "Robust Parameterized Component Analysis: Theory and Applications to 2D Facial Appearance Models," Computer Vision and Image Understanding, vol. 91, pp. 53-71, 2003.
[96] E.G. Learned-Miller, "Data Driven Image Models through Continuous Joint Alignment," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 236-250, Feb. 2006.
[97] M. Cox, S. Lucey, S. Sridharan, and J. Cohn, "Least Squares Congealing for Unsupervised Alignment of Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[98] S. Baker, I. Matthews, and J. Schneider, "Automatic Construction of Active Appearance Models as an Image Coding Problem," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 10, pp. 1380-1384, Oct. 2004.
[99] I. Kookinos and A. Yuille, "Unsupervised Learning of Object Deformation Models," Proc. Int'l Conf. Computer Vision, 2007.
[100] F. De la Torre and M. Nguyen, "Parameterized Kernel Principal Component Analysis: Theory and Applications to Supervised and Unsupervised Image Alignment," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[101] M.J. Black and A.D. Jepson, "Eigentracking: Robust Matching and Tracking of Objects Using View-Based Representation," Int'l J. Computer Vision, vol. 26, no. 1, pp. 63-84, 1998.
[102] T.F. Cootes, G.J. Edwards, and C.J. Taylor, "Active Appearance Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[103] T. Cootes and C. Taylor, "Statistical Models of Appearance for Computer Vision," technical report, Univ. of Manchester, 2001.
[104] G. Roig, X. Boix, and F. De la Torre, "Feature Selection for Subspace Image Matching," Proc. Second IEEE Int'l Workshop Subspace Methods, 2009.
[105] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[106] B. Moghaddam, G. A, Y. Weiss, and S. Avidan, "Sparse Regression as a Sparse Eigenvalue Problem," Proc. Information Theory and Applications Workshop, 2008.
[107] B. Olshausen and D. Field, "Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?" Vision Research, vol. 37, pp. 3311-3325, 1997.
[108] C. Williams, "On a Connection between Kernel PCA and Metric Multidimensional Scaling," Proc. Neural Information Processing Systems, 2001.
[109] C. Ding, T. Li, and W. Peng, "On the Equivalence between Non-Negative Matrix Factorization and Probabilistic Latent Semantic Indexing," Computational Statistics and Data Analysis, vol. 52, pp. 3913-3927, 2008.
[110] M. Collins, S. Dasgupta, and R. Schapire, "A Generalization of Principal Components Analysis to the Exponential Family," Proc. Neural Information Processing Systems, 2002.
[111] G. Gordon, "Generalized Linear Models," Proc. Neural Information Processing Systems, 2002.
[112] B.S. Everitt, An Introduction to Latent Variable Models. Chapman and Hall, 1984.
[113] T.G. Kolda and B.W. Bader, "Tensor Decompositions and Applications," SIAM Rev., vol. 51, pp. 455-500, 2009.
[114] J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley, 1999.
[115] F. De la Torre and T. Kanade, "Multimodal Oriented Discriminant Analysis," Proc. Int'l Conf. Machine Learning, 2005.

Index Terms:
regression analysis,correlation methods,data visualisation,feature extraction,learning (artificial intelligence),least squares approximations,matrix algebra,pattern classification,pattern clustering,principal component analysis,small sample size problem,least-squares framework,principal component analysis,linear discriminant analysis,canonical correlation analysis,locality preserving projections,spectral clustering,feature extraction step,modeling,classification,visualization,CA techniques,eigen-problems,data nonlinear representation learning,eigen-formulation,rank deficient matrices,normalization factor intuitive interpretation lackness,least-squares weighted kernel reduced rank regression,numerical schemes,Principal component analysis,Kernel,Equations,Mathematical model,Covariance matrix,Algorithm design and analysis,Analytical models,dimensionality reduction.,Principal component analysis,linear discriminant analysis,canonical correlation analysis,k-means,spectral clustering,reduced rank regression,kernel methods
Citation:
F. De la Torre, "A Least-Squares Framework for Component Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 6, pp. 1041-1055, June 2012, doi:10.1109/TPAMI.2011.184
Usage of this product signifies your acceptance of the Terms of Use.