This Article 
 Bibliographic References 
 Add to: 
Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis
January 2011 (vol. 33 no. 1)
pp. 194-200
Liang Sun, Arizona State University, Tempe
Shuiwang Ji, Arizona State University, Tempe
Jieping Ye, Arizona State University, Tempe
Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is well-known that CCA can be formulated as a least-squares problem in the binary class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem. Based on this equivalence relationship, efficient algorithms for solving least-squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions, including the sparse CCA formulation based on the 1-norm regularization. We further extend the least-squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.

[1] H. Hotelling, "Relations between Two Sets of Variables," Biometrika, vol. 28, pp. 312-377, 1936.
[2] D. Hardoon, S. Szedmak, and J. Shawe-Taylor, "Canonical Correlation Analysis: An Overview with Application to Learning Methods," Neural Computation, vol. 16, no. 12, 2004.
[3] J.-P. Vert and M. Kanehisa, "Graph-Driven Feature Extraction from Microarray Data Using Diffusion Kernels and Kernel CCA," Proc. Ann. Conf. Neural Information Processing Systems, vol. 15, pp. 1425-1432, 2003.
[4] S. Yu, K. Yu, V. Tresp, and H.-P. Kriegel, "Multi-Output Regularized Feature Projection," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 12, pp. 1600-1613, Dec. 2006.
[5] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[6] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[7] G. Golub and C.V. Loan, Matrix Computations. Johns Hopkins Press, 1996.
[8] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc.: Series B, vol. 58, no. 1, pp. 267-288, 1996.
[9] A. d'Aspremont, L. Ghaoui, M. Jordan, and G. Lanckriet, "A Direct Formulation for Sparse PCA Using Semidefinite Programming," Proc. Ann. Conf. Neural Information Processing Systems, vol. 16, pp. 41-48, 2004.
[10] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, "1-Norm Support Vector Machines," Proc. Ann. Conf. Neural Information Processing Systems, vol. 15, pp. 49-56, 2003.
[11] D. Watkins, Fundamentals of Matrix Computations. John Wiley & Sons, Inc., 1991.
[12] B. Sriperumbudur, D. Torres, and G. Lanckriet, "Sparse Eigen Methods by D.C. Programming," Proc. Int'l Conf. Machine Learning, pp. 831-838, 2007.
[13] T. Hastie, A. Buja, and R. Tibshirani, "Penalized Discriminant Analysis," Annals of Statistics, vol. 23, pp. 73-102, 1995.
[14] C. Paige and M. Saunders, "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares," ACM Trans. Math. Software, vol. 8, no. 1, pp. 43-71, 1982.
[15] F. Bach and M. Jordan, "Kernel Independent Component Analysis," J. Machine Learning Research, vol. 3, pp. 1-48, 2003.
[16] J. Ye, "Least Squares Linear Discriminant Analysis," Proc. Int'l Conf. Machine Learning, pp. 1087-1094, 2007.
[17] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[18] J. Liu, S. Ji, and J. Ye, SLEP: Sparse Learning with Efficient Projections. Arizona State Univ., SLEP, 2009.
[19] J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani, "Pathwise Coordinate Optimization," Annals of Applied Statistics, pp. 302-332, 2007.
[20] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least Angle Regression," Annals of Statistics, vol. 32, pp. 407-499, 2004.
[21] Y. Saad, Numerical Methods for Large Eigenvalue Problems. Halsted Press, 1992.
[22] M. Barker and W. Rayens, "Partial Least Squares for Discrimination," J. Chemometrics, vol. 17, no. 3, pp. 166-173, 2003.
[23] D. Hardoon, "Semantic Models for Machine Learning," PhD dissertation, Univ. of Southampton, 2006.
[24] R. Rosipal and N. Krämer, "Overview and Recent Advances in Partial Least Squares," Subspace, Latent Structure and Feature Selection Techniques, pp. 34-51, Springer, 2006.
[25] K. Worsley, J.-B. Poline, K. Friston, and A. Evans, "Characterizing the Response of PET and fMRI Data Using Multivariate Linear Models," Neuroimage, vol. 6, no. 4, pp. 305-319, 1997.
[26] F. Bach and M. Jordan, "A Probabilistic Interpretation of Canonical Correlation Analysis," technical report, Univ. of California, Berkeley, 2005.
[27] D. Hardoon and J. Shawe-Taylor, "KCCA for Different Level Precision in Content-Based Image Retrieval," Proc. Third Int'l Workshop Content-Based Multimedia Indexing, 2003.
[28] P. Tomancak et al., "Systematic Determination of Patterns of Gene Expression During Drosophila Embryogenesis," Genome Biology, vol. 3, no. 12, p. 88, 2002.
[29] M. Boutell, J. Luo, X. Shen, and C. Brown, "Learning Multi-Label Scene Classification," Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[30] H. Kazawa, T. Izumitani, H. Taira, and E. Maeda, "Maximal Margin Labeling for Multi-Topic Text Categorization," Proc. Ann. Conf. Neural Information Processing Systems, vol. 17, pp. 649-656, 2005.
[31] Y. Yang and J. Pedersen, "A Comparative Study on Feature Selection in Text Categorization," Proc. Int'l Conf. Machine Learning, pp. 412-420, 1997.
[32] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.

Index Terms:
Canonical correlation analysis, least squares, multilabel learning, partial least squares, regularization.
Liang Sun, Shuiwang Ji, Jieping Ye, "Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 194-200, Jan. 2011, doi:10.1109/TPAMI.2010.160
Usage of this product signifies your acceptance of the Terms of Use.