This Article 
 Bibliographic References 
 Add to: 
Bayes Optimality in Linear Discriminant Analysis
April 2008 (vol. 30 no. 4)
pp. 647-657
We present an algorithm which provides the one-dimensional subspace where the Bayeserror is minimized for the C class problem with homoscedastic Gaussian distributions. Ourmain result shows that the set of possible one-dimensional spaces v, for which the order ofthe projected class means is identical, defines a convex region with associated convex Bayeserror function g(v). This allows for the minimization of the error function using standardconvex optimization algorithms. Our algorithm is then extended to the minimization of theBayes error in the more general case of heteroscedastic distributions. This is done by meansof an appropriate kernel mapping function. This result is further extended to obtain the d-dimensional solution for any given d, by iteratively applying our algorithm to the null space ofthe (d — 1)-dimensional solution. We also show how this result can be used to improve uponthe outcomes provided by existing algorithms, and derive a low-computational cost, linearapproximation. Extensive experimental validations are provided to demonstrate the use ofthese algorithms in classification, data analysis and visualization.

[1] B.B. Verbeck, M.V. Chafee, D.A. Crowe, and A.P. Georgopoulos, “Parallel Processing of Serial Movements in Prefrontal Cortex,” Proc. Nat'l Academy of Sciences of the USA, vol. 99, no. 20, pp. 13172-13177, 2002.
[2] G. Baudat and F. Anouar, “Generalized Discriminant Analysis Using a Kernel Approach,” Neural Computation, vol. 12, no. 10, pp.2385-2404, 2000.
[3] S.V. Beiden, M.A. Maloof, and R.F. Wagner, “A General Model for Finite-Sample Effects in Training and Testing of Competing Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1561-1569, Dec. 2003.
[4] R.A. Fisher, “The Statistical Utilization of Multiple Measurements,” Annals of Eugenics, vol. 8, pp. 376-386, 1938.
[5] K. Fukunaga and J.M. Mantock, “Nonparametric Discriminant Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, pp. 671-678, 1983.
[6] S. Geisser, “Discrimination, Allocatory, and Separatory Linear Aspects,” Classification and Clustering, J. Van Ryzin, ed., pp. 301-330, 1977.
[7] P.E. Gill, W. Murray, and M.H. Wright, Numerical Linear Algebra and Optimization, vol. 1. Addison-Wesley, 1991.
[8] B. Leibe and B. Schiele, “Analyzing Appearance and Contour Based Methods for Object Categorization,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[9] M. Loog, R.P.W. Duin, and R. Haeb-Umbach, “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001.
[10] R. Lotlikar and R. Kothari, “Fractional-Step Dimensionality Reduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 623-627, June 2000.
[11] J. Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, “Face Recognition Using LDA-Based Algorithms,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 195-200, 2003.
[12] A.M. Martinez and A.C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001.
[13] A.M. Martinez and M. Zhu, “Where Are Linear Feature Extraction Methods Applicable,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1934-1944, Dec. 2005.
[14] S. Michiels, S. Koscielny, and C. Hill, “Prediction of Cancer Outcome with Microarrays: A Multiple Random Validation Strategy,” Lancet, vol. 365, no. 9458, pp. 488-492, 2005.
[15] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller, “Fisher Discriminant Analysis with Kernels,” Proc. IEEE Neural Networks for Signal Processing Workshop, 1999.
[16] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Sciences, Univ. of California, Irvine,, , 1998.
[17] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the Face Recognition Grand Challenge,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[18] C.R. Rao, Linear Statistical Inference and Its Applications, second ed. Wiley Interscience, 2002.
[19] M. Sommer, A. Olbrich, and M. Arendasy, “Improvements in Personnel Selection with Neural Networks: A Pilot Study in the Field of Aviation Psychology,” Int'l J. Aviation Psychology, vol. 14, no. 1, pp. 103-115, 2004.
[20] M.J. Schervish, “Linear Discrimination for Three Known Normal Populations,” J. Statistical Planning and Inference, vol. 10, pp. 167-175, 1984.
[21] J. Yang, G.W. Xu, Q.F. Hong, H.M. Liebich, K. Lutz, R.M. Schmulling, and H.G. Wahl, “Discrimination of Type 2 Diabetic Patients from Healthy Controls by Using Metabonomics Method Based on Their Serum Fatty Acid Profiles,” J. Chromatography B-Analytical Technologies in the Biomedical and Life Sciences, vol. 813, nos. 1-2, pp. 53-58, Dec.25, 2004.
[22] J.P. Ye, T. Li, T. Xiong, and R. Janardan, “Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181-190, Oct.-Dec. 2004.

Index Terms:
Linear discriminant analysis, feature extraction, Bayes optimal, convex optimization, pattern recognition, data mining, data visualization
Onur C. Hamsici, Aleix M. Martinez, "Bayes Optimality in Linear Discriminant Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 647-657, April 2008, doi:10.1109/TPAMI.2007.70717
Usage of this product signifies your acceptance of the Terms of Use.