Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2192-2205
Zhao Zhang , City University of Hong Kong, Hong Kong
Mingbo Zhao , City University of Hong Kong, Hong Kong
Tommy W.S. Chow , City University of Hong Kong, Hong Kong
ABSTRACT
This paper incorporates the group sparse representation into the well-known canonical correlation analysis (CCA) framework and proposes a novel discriminant feature extraction technique named group sparse canonical correlation analysis (GSCCA). GSCCA uses two sets of variables and aims at preserving the group sparse (GS) characteristics of data within each set in addition to maximize the global interset covariance. With GS weights computed prior to feature extraction, the locality, sparsity and discriminant information of data can be adaptively determined. The GS weights are obtained from an NP-hard group-sparsity promoting problem that considers all highly correlated data within a group. By defining one of the two variable sets as the class label matrix, GSCCA is effectively extended to multiclass scenarios. Then GSCCA is theoretically formulated as a least-squares problem as CCA does. Comparative analysis between this work and the related studies demonstrate that our algorithm is more general exhibiting attractive properties. The projection matrix of GSCCA is analytically solved by applying eigen-decomposition and trace ratio (TR) optimization. Extensive benchmark simulations are conducted to examine GSCCA. Results show that our approach delivers promising results, compared with other related algorithms.
INDEX TERMS
Vectors, Correlation, Feature extraction, Sparse matrices, Optimization, Iron, Encoding, multiclass classification, Vectors, Correlation, Feature extraction, Sparse matrices, Optimization, Iron, Encoding, feature extraction, Canonical correlation analysis, group sparse representation
CITATION
Zhao Zhang, Mingbo Zhao, Tommy W.S. Chow, "Binary- and Multi-class Group Sparse Canonical Correlation Analysis for Feature Extraction and Classification", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2192-2205, Oct. 2013, doi:10.1109/TKDE.2012.217
REFERENCES
 [1] D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor, "Canonical Correlation Analysis: An Overview with Applications to Learning Methods," Neural Computation, vol. 16, no. 12, pp. 2639-2664, 2004. [2] H. Hotelling, "Relations between Two Sets of Variates," Biometrika, vol. 28, pp. 321-377, 1936. [3] T. Sun and S.C. Chen, "Class Label versus Sample Label-Based CCA," Applied Math. and Computation, vol. 185, no. 1, pp. 272-283, 2007. [4] L. Sun, S.W. Ji, S.P. Yu, and J.P. Ye, "On the Equivalence between Canonical Correlation Analysis and Orthonormalized Partial Least Squares," Proc. 21st Int'l Joint Conf. Artificial Intelligence (IJCAI), pp.1230-1235, 2009. [5] A. Lykou and J. Whittaker, "Sparse CCA Using a Lasso with Positivity Constraints," Computational Statistics and Data Analysis, vol. 54, pp. 3144-3157, 2010. [6] M. Loog, B. Ginneken, and R.P.W. Duin, "Dimensionality Reduction by Canonical Contextual Correlation Projections," Proc. European Conf. Computer Vision (ECCV), pp. 562-573, 2004. [7] D.R. Hardoon and J. Shawe-Taylor, "Sparse Canonical Correlation Analysis," Machine Learning, vol. 83, no. 3, pp. 331-353, 2011. [8] P. Elena, T. David, and B. Joseph, "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1-34, 2009. [9] S. Waaijenborg, P.C.V. de Witt Hamer, and A.H. Zwinderman, "Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis," Statistical Applications in Genetics and Molecular Biology, vol. 7, no. 1, p. 3, 2008. [10] M. Naylor, X. Lin, S. Weiss, B. Raby, and C. Lange, "Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants," PLoS One, vol. 5, no. 5, p. e10395, 2010. [11] S. Waaijenborg and A.H. Zwinderman, "Sparse Canonical Correlation Analysis for Identifying, Connecting and Completing Gene-Expression Networks," BMC Bioinformatics, vol. 10, article 315, 2009. [12] A. Martinez and A. Kak, "PCA versus LDA," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001. [13] L. Sun, S.W. Ji, and J.P. Ye, "Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis," IEEE Trans. Patten Analysis and Machine Intelligence, vol. 33, no. 1, pp. 194-200, Jan. 2011. [14] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003. [15] S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000. [16] J.B. Tenenbaum, V. Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000. [17] J. Wright, A. Yang, S. Sastry, and Y. Ma, "Robust Face Recognition via Sparse Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009. [18] R. He, B.G. Hu, W.S. Zheng, and Y.Q. Guo, "Two-Stage Sparse Representation for Robust Recognition on Large-Scale Database," Proc. AAAI Conf. Artificial Intelligence, 2010. [19] A. Wiesel, M. Kliger, and A.O. HeroIII, "A Greedy Approach to Sparse Canonical Correlation Analysis," Annals of Statistics, available in arXiv:0801.2748v1, 2008. [20] B. Cheng, J.C. Yang, S.C. Yan, Y. Fu, and T.S. Huang, "Learning with $l^{1}$ -Graph for Image Analysis," IEEE Trans. Image Processing vol. 19, no. 4, pp. 858-866, Apr. 2010. [21] D. Cai, X. He, and J. Han, "Spectral Regression: A Unified Approach for Sparse Subspace Learning," Proc. IEEE Seventh Int'l Conf. Data Mining (ICDM), 2007. [22] S. Zhu, D. Wang, K. Yu, T. Li, and Y. Gong, "Feature Selection for Gene Expression Using Model-Based Entropy," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 25-36, Jan.-Mar. 2010. [23] A. Majumdar and R.K. Ward, "Robust Classifiers for Data Reduced via Random Projections," IEEE Trans. Systems, Man and Cybernetics, Part B-Cybernetics, vol. 40, no. 5, pp. 1359-1371, Oct. 2010. [24] H. Zou and T. Hastie, "Regularization and Variable Selection via the Elastic Net," J. Royal Statistical Soc. B, vol. 67, no. 2, pp. 301-320, 2005. [25] L.S. Qiao, S.C. Chen, and X.Y. Tan, "Sparsity Preserving Projections with Applications to Face Recognition," Pattern Recognition, vol. 43, no. 1, pp. 31-341, 2010. [26] Y. Jia, F. Nie, and C. Zhang, "Trace Ratio Problem Revisited," IEEE Trans. Neural Network, vol. 20, no. 4, pp. 729-735, Apr. 2009. [27] H. Wang, S. Yan, D. Xu, X. Tang, and T. Huang, "Trace Ratio vs. Ratio Trace for Dimensionality Reduction," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2007. [28] L. Sun, S.W. Ji, and J.P. Ye, "A Least Squares Formulation for Canonical Correlation Analysis," Proc. 25th Int'l Conf. Machine Learning (ICML), pp. 1024-1031, 2008. [29] M. Barker and W. Rayens, "Partial Least Squares (PLS) for Discrimination," J. Chemometrics, vol. 17, pp. 166-173, 2003. [30] T.V. Gestel, J.A.K. Suykens, J. De Brabanter, B. De Moor, and J. Vandewalle, "Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines," Proc. Int'l Conf. Artificial Neural Networks (ICANN), 2001. [31] A.C. Yau, X.C. Tai, and M.K. Ng, "Compression and Denoising Using $l_0$ -Norm," Computational Optimization and Applications, vol. 50, no. 2, pp. 425-444, 2011. [32] A. Majumdar and R.K. Ward, "Classification via Group Sparsity Promoting Regularization," Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing (ICASSP), 2009. [33] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007. [34] Y. Guo, S. Li, J. Yang, T. Shu, and L. Wu, "A Generalized Foley-Sammon Transform Based on Generalized Fisher Discriminant Criterion and Its Application to Face Recognition," Pattern Recognition Letters, vol. 24, nos. 1-3, pp. 147-158, 2003. [35] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1991. [36] H. Li, T. Jiang, and K. Zhang, "Efficient and Robust Feature Extraction by Maximum Margin Criterion," IEEE Trans. Neural Networks, vol. 17, no. 1, pp. 157-165, Jan. 2006. [37] B. Leibe and B. Schiele, "Analyzing Appearance and Contour Based Methods for Object Categorization," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition (CVPR), 2003. [38] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics," Proc. IEEE Eighth Int'l Conf. Computer Vision (ICCV), pp. 416-423, 2001. [39] J.F. Ning, L. Zhang, D. Zhang, and C.K. Wu, "Interactive Image Segmentation by Maximal Similarity Based Region Merging," Pattern Recognition, vol. 43, no. 2, pp. 445-456, 2010. [40] J.D. Wang, F. Wang, C.S. Zhang, H.C. Shen, and L. Quan, "Linear Neighborhood Propagation and Its Applications," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 1600-1615, Sept. 2009. [41] B. Schölkopf and A. Smola, Learning with Kernels. MIT Press, 2002. [42] Y. Guo, J. Gao, and P. Kwan, "Kernel Laplacian Eigenmaps for Visualization of Non-Vectorial Data," Proc. 19th Australian Joint Conf. Artificial Intelligence: Advances in Artificial Intelligence, pp. 1179-1183, 2006. [43] L. Sun, B. Ceran, and J.P. Ye, "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2010. [44] D. Cai, X.F. He, and J.W. Han, "SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 1, pp. 1-12, Jan. 2008.