CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2009 vol.31 Issue No.02 - February

Subscribe

Issue No.02 - February (2009 vol.31)

pp: 260-274

Xuelong Li , University of London, London

Xindong Wu , University of Vermont, Burlington

Stephen J. Maybank , Birkbeck College, University of London, London

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2008.70

ABSTRACT

Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.

INDEX TERMS

Arithmetic mean, Fisher's linear discriminant analysis (FLDA), geometric mean, Kullback-Leibler (KL) divergence, machine learning, subspace selection (or dimensionality reduction), visualization.

CITATION

Xuelong Li, Xindong Wu, Stephen J. Maybank, "Geometric Mean for Subspace Selection",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.31, no. 2, pp. 260-274, February 2009, doi:10.1109/TPAMI.2008.70REFERENCES

- [1] S. Boyd and L. Vandenberghe,
Convex Optimization. Cambridge Univ. Press, 2004.- [2] H. Bensmail and G. Celeux, “Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition,”
J. Am. Statistical Assoc., vol. 91, pp. 1743-1748, 1996.- [3] C. Bouveyron, S. Girard, and C. Schmid, “High Dimensional Discriminant Analysis,”
Comm. in Statistics: Theory and Methods, vol. 36, no. 14, p. 2007,- [4] N. Campbell, “Canonical Variate Analysis—A General Formulation,”
Australian J. Statistics, vol. 26, pp. 86-96, 1984.- [5] T.M. Cover and J.A. Thomas,
Elements of Information Theory. Wiley, 1991.- [6] L.S. Daniel and J. Weng, “Hierarchical Discriminant Analysis for Image Retrieval,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 386-401, May 1999.- [7] F. De la Torre and T. Kanade, “Multimodal Oriented Discriminant Analysis,”
Proc. Int'l Conf. Machine Learning, Aug. 2005.- [8] H.P. Decell and S.M. Mayekar, “Feature Combinations and the Divergence Criterion,”
Computers and Math. with Applications, vol. 3, pp. 71-76, 1977.- [9] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification, second ed. John Wiley & Sons, 2001.- [10] S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,”
J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, 2002.- [11] M. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.- [12] J.H. Friedman, “Regularized Discriminant Analysis,”
J. Am. Statistical Assoc., vol. 84, pp. 165-175, 1989.- [13] K. Fukunaga,
Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.- [14] R.A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,”
Annals of Eugenics, vol. 7, pp. 179-188, 1936.- [15] K. Fukumizu, F.R. Bach, and M.I. Jordan, “Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces,”
J. Machine Learning Research, vol. 5, pp. 73-99, 2004.- [16] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, “Neighbourhood Components Analysis,”
Neural Information Processing Systems, 2004.- [17] T. Hastie, A. Buja, and R. Tibshirani, “Penalized Discriminant Analysis,”
Annals of Statistics, vol. 23, pp. 73-102, 1995.- [18] T. Hastie, R. Tibshirani, and J.H. Friedman,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.- [19] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,”
J. Royal Statistical Soc. B: Methodological, vol. 58, pp. 155-176, 1996.- [20] T. Hastie and R. Tibshirani, “Discriminant Adaptive Nearest Neighbor Classification,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607-615, June 1996.- [21] B. Jelinek,“Review on Heteroscedastic Discriminant Analysis,” unpublished report, Center for Advanced Vehicular Systems, Mississippi State Univ., 2001.
- [22] T.K. Kim and J. Kittler, “Locally Linear Discriminant Analysis for Multimodally Distributed Classes for Face Recognition with a Single Model Image,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 318-327, Mar. 2005.- [23] R. Lotlikar and R. Kothari, “Fractional-Step Dimensionality Reduction,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 623-627, June 2000.- [24] M. Loog,
Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalizations of the Fisher Criterion. Delft Univ. Press, 1999.- [25] M. Loog, R.P.W. Duin, and R. Haeb-Umbach, “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001.- [26] M. Loog and R.P.W. Duin, “Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 732-739, June 2004.- [27] J. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “Face Recognition Using LDA Based Algorithms,”
IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 195-200, Jan. 2003.- [28] G.J. McLachlan,
Discriminant Analysis and Statistical Pattern Recognition. Wiley, 1992.- [29] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz,
UCI Repository of Machine Learning Databases, Dept. Information and Computer Sciences, Univ. of California, http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.- [30] J. Peltonen and S. Kaski, “Discriminative Components of Data,”
IEEE Trans. Neural Networks, vol. 16, pp. 68-83, 2005.- [31] C.R. Rao, “The Utilization of Multiple Measurements in Problems of Biological Classification,”
J. Royal Statistical Soc. B: Methodological, vol. 10, pp. 159-203, Oct. 1948.- [32] S. Raudys and R.P.W. Duin, “On Expected Classification Error of the Fisher Linear Classifier with Pseudo-Inverse Covariance Matrix,”
Pattern Recognition Letter, vol. 19, nos. 5-6, 1998.- [33] A. Schönhage, A.F.W. Grotefeld, and E. Vetter,
Fast Algorithms-A Multitape Turing Machine Implementation. BI Wissenschafts, 1994.- [34] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.- [35] M. Sugiyama, “Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis,”
J. Machine Learning Research, vol. 8, pp. 1027-1061, 2007.- [36] D.L. Swets and J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, Aug. 1996.- [37] D. Tao, X. Li, X. Wu, and S.J. Maybank, “General Averaged Divergences Analysis,”
Proc. IEEE Int'l Conf. Data Mining, 2007.- [38] D. Tao, X. Li, X. Wu, and S.J. Maybank, “General Tensor Discriminant Analysis and Gabor Features for Gait Recognition,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, Oct. 2007.- [39] D. Tao, X. Li, X. Wu, and S.J. Maybank, “Supervised Tensor Learning,”
Knowledge and Information Systems, vol. 13, no. 1, pp. 1-42, 2007.- [40] K. Torkkola, “Feature Extraction by Non-Parametric Mutual Information Maximization,”
J. Machine Learning Research, vol. 3, pp. 1415-1438, 2003.- [41] J. Ye, R. Janardan, C.H. Park, and H. Park, “An Optimization Criterion for Generalized Discriminant Analysis on Undersampled Problems,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 982-994, Aug. 2004.- [42] J. Ye and Q. Li, “A Two-Stage Linear Discriminant Analysis via QR-Decomposition,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 929-941, June 2005.- [43] H. Yu and J. Yang, “A Direct LDA Algorithm for High-Dimensional Data with Application to Face Recognition,”
Pattern Recognition, vol. 34, no. 12, pp. 2067-2070, Dec. 2001.- [44] M. Zhu and T. Hastie, “Feature Extraction for Non-Parametric Discriminant Analysis,”
J. Computational and Graphical Statistics, vol. 12, pp. 101-120, 2003. |