CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2010 vol.32 Issue No.10 - October

Subscribe

Issue No.10 - October (2010 vol.32)

pp: 1822-1831

JooSeuk Kim , University of Michigan, Ann Arbor

Clayton D. Scott , University of Michigan, Ann Arbor

ABSTRACT

Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Well--known examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes the L_2 or integrated squared error (ISE) of a “difference of densities.” We focus on the Gaussian kernel, although the method applies to other kernels suitable for density estimation. Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We provide statistical performance guarantees for the proposed L_2 kernel classifier in the form of a finite sample oracle inequality and strong consistency in the sense of both ISE and probability of error. A special case of our analysis applies to a previously introduced ISE-based method for kernel density estimation. For dimensionality greater than 15, the basic L_2 kernel classifier performs poorly in practice. Thus, we extend the method through the introduction of a natural regularization parameter, which allows it to remain competitive with the SVM in high dimensions. Simulation results for both synthetic and real-world data are presented.

INDEX TERMS

Kernel methods, sparse classifiers, integrated squared error, difference of densities, SMO algorithm.

CITATION

JooSeuk Kim, Clayton D. Scott, "L₂ Kernel Classification",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.32, no. 10, pp. 1822-1831, October 2010, doi:10.1109/TPAMI.2009.188REFERENCES

- [1] B. Schölkopf and A.J. Smola,
Learning with Kernels. MIT Press, 2002.- [2] C. Cortes and V. Vapnik, "Support-Vector Networks,"
Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.- [3] D. Kim, "Least Squares Mixture Decomposition Estimation," unpublished doctoral dissertation, Dept. of Statistics, Virginia Polytechnic Inst. and State Univ., 1995.
- [4] M. Girolami and C. He, "Probability Density Estimation from Optimally Condensed Data Samples,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1253-1264, Oct. 2003.- [5] B.A. Turlach, "Bandwidth Selection in Kernel Density Estimation: A Review," Technical Report 9317, C.O.R.E. and Inst. de Statistique, Université Catholique de Louvain, 1993.
- [6] D.W. Scott, "Parametric Statistical Modeling by Minimum Integrated Square Error,"
Technometrics, vol. 43, pp. 274-285, 2001.- [7] F. Bunea, A.B. Tsybakov, and M.H. Wegkamp, "Sparse Density Estimation with $l_1$ Penalties,"
Proc. 20th Ann. Conf. Learning Theory, pp. 530-543, 2007.- [8] P.H. Rigollet and A.B. Tsybakov, "Linear and Convex Aggregation of Density Estimators," https://hal.ccsd.cnrs.fr ccsd-00068216 , 2004.
- [9] R. Jenssen, D. Erdogmus, J.C. Principe, and T. Eltoft, "Towards a Unification of Information Theoretic Learning and Kernel Method,"
Proc. IEEE Workshop Machine Learning for Signal Processing, 2004.- [10] C. He and M. Girolami, "Novelty Detection Employing an ${L}_2$ Optimal Nonparametric Density Estimator,"
Pattern Recognition Letters, vol. 25, pp. 1389-1397, 2004.- [11] P. Hall and M.P. Wand, "On Nonparametric Discrimination Using Density Differences,"
Biometrika, vol. 75, no. 3, pp. 541-547, Sept. 1988.- [12] M. Di Marzio and C.C. Taylor, "Kernel Density Classification and Boosting: An ${L}_2$ Analysis,"
Statistics and Computing, vol. 15, pp. 113-123, Apr. 2005.- [13] P. Meinicke, T. Twellmann, and H. Ritter, "Discriminative Densities from Maximum Contrast Estimation,"
Proc. Advances in Neural Information Processing Systems, vol. 15, pp. 985-992, 2002.- [14] C.T. Wolverton and T.J. Wagner, "Asymptotically Optimal Discriminant Functions for Pattern Classification,"
IEEE Trans. Information Theory, vol. 15, no. 2, pp. 258-265, Mar. 1969.- [15] K. Pelckmans, J.A.K. Suykens, and B. De Moor, "A Risk Minimization Principle for a Class of Parzen Estimators,"
Proc. Advances in Neural Information Processing Systems, vol. 20, Dec. 2007.- [16] J. Kim and C. Scott, "Kernel Classification via Integrated Squared Error,"
Proc. IEEE Workshop Statistical Signal Processing, Aug. 2007.- [17] J. Kim and C. Scott, "Performance Analysis for ${L}_2$ Kernel Classification,"
Proc. Advances in Neural Information Processing Systems, vol. 21, Dec. 2008.- [18] M.P. Wand and M.C. Jones,
Kernel Smoothing. Chapman & Hall, 1995.- [19] J.C. Platt, "Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines," Technical Report MSR-TR-98-14, Apr. 2001.
- [20] D. Crisp and C. Burges, "A Geometric Interpretation of $\nu$ -SVM Classifiers,"
Proc. Neural Information Processing Systems, vol. 12, 1999.- [21] K. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu, "Enlarging the Margins in Perceptron Decision Trees,"
Machine Learning, vol. 41, pp. 295-313, 2000.- [22] A.S. Paulson, E.W. Holcomb, and R.A. Leitch, "The Estimation of the Parameters of the Stable Laws,"
Biometrika, vol. 62, pp. 163-170, 1975.- [23] C.R. Heathcote, "The Integrated Squared Error Estimation of Parameters,"
Biometrika, vol. 64, pp. 255-264, 1977.- [24] J.A.K. Suykens and J. Vandewalle, "Least Squares Support Vector Machine Classifiers,"
Neural Processing Letters, vol. 44, no. 8, pp. 293-300, June 1999.- [25] J.R. Schechuk, "An Introduction to the Conjugate Gradient Method without the Agonizing Pain," Technical Report MSR-TR-98-14, Aug. 1994.
- [26] D. Berry, K. Chaloner, and J. Geweke,
Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner. Wiley, 1996.- [27] A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf, "Kernel Methods for Measuring Independence,"
J. Machine Learning Research, vol. 6, pp. 2075-2129, 2005.- [28] C.-C. Chang and C.-J. Lin,
LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.- [29] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, "An Introduction to Kernel-Based Learning Algorithms,"
IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 181-201, Mar. 2001. |