This Article 
 Bibliographic References 
 Add to: 
Subclass Discriminant Analysis
August 2006 (vol. 28 no. 8)
pp. 1274-1286
Over the years, many Discriminant Analysis (DA) algorithms have been proposed for the study of high-dimensional data in a large variety of problems. Each of these algorithms is tuned to a specific type of data distribution (that which best models the problem at hand). Unfortunately, in most problems the form of each class pdf is a priori unknown, and the selection of the DA algorithm that best fits our data is done over trial-and-error. Ideally, one would like to have a single formulation which can be used for most distribution types. This can be achieved by approximating the underlying distribution of each class with a mixture of Gaussians. In this approach, the major problem to be addressed is that of determining the optimal number of Gaussians per class, i.e., the number of subclasses. In this paper, two criteria able to find the most convenient division of each class into a set of subclasses are derived. Extensive experimental results are shown using five databases. Comparisons are given against Linear Discriminant Analysis (LDA), Direct LDA (DLDA), Heteroscedastic LDA (HLDA), Nonparametric DA (NDA), and Kernel-Based LDA (K-LDA). We show that our method is always the best or comparable to the best.

[1] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Sciences, University of California, Irvine, , 1998.
[2] G. Baudat and F. Anouar, “Generalized Discriminant Analysis Using a Kernel Approach,” Neural Computation, vol. 12, pp. 2385-2404, 2000.
[3] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[4] F. De la Torre and T. Kanade, “Oriented Discriminant Analysis,” Proc. British Machine Vision Conf., Sept. 2004.
[5] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 30, no. 1, pp. 1-38, 1977.
[6] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.
[7] K. Etemad and R. Chellapa, “Discriminant Analysis for Recognition of Human Face Images,” J. Optical Soc. Am. A, vol. 14, no. 8, pp. 1724-1733, 1997.
[8] R.A. Fisher, “The Statistical Utilization of Multiple Measurements,” Annals of Eugenics, vol. 8, pp. 376-386, 1938.
[9] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[10] T. Hastie, R. Tibshirani, and A. Buja, “Flexible Discriminant Analysis by Optimal Scoring,” J. Am. Statistical Assoc., vol. 89, pp. 1255-1270, 1994.
[11] T. Hastie, A. Buja, and R. Tibshirani, “Penalized Discriminant Analysis,” Annals of Statistics, vol. 23, pp. 73-102, 1995.
[12] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixture,” J. Royal. Statistical Soc. B., vol. 58, pp. 155-176, 1996.
[13] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[14] B. Leibe and B. Schiele, “Analyzing Appearance and Contour Based Methods for Object Categorization,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2003.
[15] W.L.G. Koontz and K. Fukunaga, “A Nonparametric Valley-Seeking Technique for Cluster Analysis,” IEEE Trans. Computers, vol. 21, pp. 171-178, 1972.
[16] M. Loog, R.P.W. Duin, and T. Haeb-Umbach, “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001.
[17] M. Loog and R.P.W. Duin, “Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 732-739, June 2004.
[18] A.M. Martínez and A.C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001.
[19] A.M. Martínez and J. Vitrià, “Clustering in Image Space for Place Recognition and Visual Annotations for Human-Robot Interaction,” IEEE Trans. Systems, Man, and Cybernetics B, vol. 31, no. 5, pp. 669-682, 2001.
[20] A.M. Martínez and M. Zhu, “Where Are Linear Feature Extraction Methods Applicable?” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1934-1944, Dec. 2005.
[21] G.J. McLanchlan and K.E. Basford, Mixture Models: Inference and Applications to Clustering. New York: Marcel Dekker, 1988.
[22] G.J. McLanchlan, Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley, 2004.
[23] C.R. Rao, Linear Statistical Inference and Its Applications, second ed. Wiley Interscience, 2002.
[24] D.L. Swets and J.J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, Aug. 1996.
[25] J. Yang, A.F. Frangi, J. Yang, D. Zhang, and Z. Jin, “KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 230-244, Feb. 2005.
[26] M. Yang, D. Kriegman, and N. Ahuja, “Face Detection Using Multimodal Density Models,” Computer Vision and Image Understanding, vol. 84, no. 2, pp. 264-284, 2001.
[27] M. Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods,” Proc. Fifth Int'l Conf. Automatic Face and Gesture Recognition, pp. 215-220, May 2002.
[28] H. Yu and J. Yang, “A Direct LDA Algorithm for High-Dimensional Data— with Applications to Face Recognition,” Pattern Recognition, vol. 34, pp. 2067-2070, 2001.
[29] S.K. Zhou and R. Chellappa, “Multiple-Exemplar Discriminant Analysis for Face Recognition,” Proc. Int'l Conf. Pattern Recognition, pp. 191-194, 2004.
[30] M. Zhu, A.M. Martínez, and H. Tan, “Template-Based Recognition of Sitting Postures,” Proc. IEEE Workshop Computer Vision and Pattern Recognition for Human-Computer Interaction, 2003.
[31] M. Zhu and A.M. Martínez, “Optimal Subclass Discovery for Discriminant Analysis,” Proc. IEEE Workshop Learning in Computer Vision and Pattern Recognition (LCVPR), 2004.

Index Terms:
Feature extraction, discriminant analysis, pattern recognition, classification, eigenvalue decomposition, stability criterion, mixture of Gaussians.
Manli Zhu, Aleix M. Martinez, "Subclass Discriminant Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274-1286, Aug. 2006, doi:10.1109/TPAMI.2006.172
Usage of this product signifies your acceptance of the Terms of Use.