CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2011 vol.33 Issue No.03 - March

Subscribe

Issue No.03 - March (2011 vol.33)

pp: 631-638

Di You , The Ohio State University, Columbus

Onur C. Hamsici , The Ohio State University, Columbus

Aleix M. Martinez , The Ohio State University, Columbus

ABSTRACT

Kernel mapping is one of the most used approaches to intrinsically derive nonlinear classifiers. The idea is to use a kernel function which maps the original nonlinearly separable problem to a space of intrinsically larger dimensionality where the classes are linearly separable. A major problem in the design of kernel methods is to find the kernel parameters that make the problem linear in the mapped representation. This paper derives the first criterion that specifically aims to find a kernel representation where the Bayes classifier becomes linear. We illustrate how this result can be successfully applied in several kernel discriminant analysis algorithms. Experimental results, using a large number of databases and classifiers, demonstrate the utility of the proposed approach. The paper also shows (theoretically and experimentally) that a kernel version of Subclass Discriminant Analysis yields the highest recognition rates.

INDEX TERMS

Kernel functions, kernel optimization, feature extraction, discriminant analysis, nonlinear classifiers, face recognition, object recognition, pattern recognition, machine learning.

CITATION

Di You, Onur C. Hamsici, Aleix M. Martinez, "Kernel Optimization in Discriminant Analysis",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.33, no. 3, pp. 631-638, March 2011, doi:10.1109/TPAMI.2010.173REFERENCES

- [1] G. Baudat and F. Anouar, "Generalized Discriminant Analysis Using a Kernel Approach,"
Neural Computation, vol. 12, no. 10, pp. 2835-2404, 2000.- [2] C.L. Blake and C.J. Merz, "UCI Repository of Machine Learning Databases," Univ. of California, Irvine, http://www.ics.uci.edu/mlearnMLRepository.html , 1998.
- [3] L. Bregman, "The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming,"
USSR Computational Math. and Math. Physics, vol. 7, pp. 200-217, 1967.- [4] B. Chen, L. Yuan, H. Liu, and Z. Bao, "Kernel Subclass Discriminant Analysis,"
Neurocomputing, vol. 71, pp. 455-458, 2007.- [5] J. Demsar, "Statistical Comparisons of Classifiers over Multiple Data Sets,"
J. Machine Learning Research, vol. 7, pp. 1-30, 2006.- [6] J. Dennis and R. Schnabel,
Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, 1983.- [7] R.A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems,"
Annals of Eugenics, vol. 7, pp. 179-188, 1936.- [8] R.A. Fisher, "The Statistical Utilization of Multiple Measurements,"
Annals of Eugenics, vol. 8, pp. 376-386, 1938.- [9] J.H. Friedman, "Regularized Discriminant Analysis,"
J. Am. Statistical Assoc., vol. 84, pp. 165-175, 1989.- [10] K. Fukunaga,
Introduction to Statistical Pattern Recognition, second ed., Academic Press, 1990.- [11] K. Fukunaga and J. Mantock, "Nonparametric Discriminant Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 6, pp. 671-678, Nov. 1983.- [12] O.C. Hamsici and A.M. Martinez, "Bayes Optimality in Linear Discriminant Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 647-657, Apr. 2008.- [13] O.C. Hamsici and A.M. Martinez, "Rotation Invariant Kernels and Their Application to Shape Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 1985-1999, Nov. 2009.- [14] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning, second ed. Springer-Verlag, 2009.- [15] X. He and P. Niyogi, "Locality Preserving Projections,"
Advances in Neural Information Processing Systems. MIT Press, 2004.- [16] S.-J. Kim, A. Magnani, and S. Boyd, "Optimal Kernel Selection in Kernel Fisher Discriminant Analysis,"
Proc. 23rd Int'l Conf. Machine Learning, pp. 465-472, 2006.- [17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition,"
Proc. IEEE, vol. 92, no. 11, pp. 2278-2324, Nov. 1998.- [18] B. Leibe and B. Schiele, "Analyzing Appearance and Contour Based Methods for Object Categorization,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.- [19] M. Loog and R.P.W. Duin, "Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 732-739, June 2007.- [20] M. Loog, R.P.W. Duin, and R. Haeb-Umbach, "Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001.- [21] A.M. Martinez and R. Benavente, "The AR Face Database," Technical Report 24, Computer Vision Center, June 1998.
- [22] A.M. Martinez and O.C. Hamsici, "Who is LB1? Discriminant Analysis for the Classification of Specimens,"
Pattern Recognition, vol. 41, pp. 3436-3441, 2008.- [23] A.M. Martinez and M. Zhu, "Where Are Linear Feature Extraction Methods Applicable?"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1934-1944, Dec. 2005.- [24] G. McLachlan and K. Basford,
Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.- [25] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller, "Fisher Discriminant Analysis with Kernels,"
Proc. IEEE Signal Processing Soc. Workshop Neural Networks for Signal Processing IX, pp. 41-48, 1999.- [26] O. Pujol and D. Masip, "Geometry-Based Ensembles: Towards a Structural Characterization of the Classification Boundary,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 1140-1146, June 2009.- [27] B. Schölkopf and A.J. Smola,
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.- [28] C.M. Theobald, "An Inequality for the Trace of the Product of Two Symmetric Matrices,"
Math. Proc. Cambridge Philosophical Soc., vol. 77, pp. 256-267, 1975.- [29] V. Vapnik,
Statistical Learning Theory. John Wiley & Sons, 1998.- [30] G. Wahba,
Spline Models for Observational Data. SIAM, 1990.- [31] L. Wang, K. Chan, P. Xue, and L. Zhou, "A Kernel-Induced Space Selection Approach to Model Selection in KLDA,"
IEEE Trans. Neural Networks, vol. 19, no. 12, pp. 2116-2131, Dec. 2008.- [32] J. Yang, A.F. Frangi, J. Yang, D. Zhang, and Z. Jin, "KPCA Plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 230-244, Feb. 2005.- [33] M.-H. Yang, "Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods,"
Proc. Fifth IEEE Int'l Conf. Automatic Face and Gesture Recognition, 2002.- [34] J. Ye, S. Ji, and J. Chen, "Multi-Class Discriminant Kernel Learning via Convex Programming,"
J. Machine Learning Research, vol. 9, pp. 719-758, 2008.- [35] M. Zhu and A.M. Martinez, "Subclass Discriminant Analysis,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1274-1286, Aug. 2006.- [36] M. Zhu and A.M. Martinez, "Pruning Noisy Bases in Discriminant Analysis,"
IEEE Trans. Neural Networks, vol. 19, no. 1, pp. 148-157, Jan. 2008. |