This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Probability Density Estimation from Optimally Condensed Data Samples
October 2003 (vol. 25 no. 10)
pp. 1253-1264

Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernel-based density estimator which employs a small percentage of the available data sample and is optimal in the L_2 sense. While only requiring O(N^2) optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires O(N^3) optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed Density-Based Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator.

[1] M.M. Astrahan, Speech Analysis by Clustering or the Hyperplane Method Stanford A.I. Project Memo, Stanford Univ., Calif., 1970.
[2] G.A. Babich and O. Camps, Weighted Parzen Windows for Pattern Classification IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 5, pp. 567-570, May 1996.
[3] C. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[4] E. Elgammal, D. Harwood, and L. Davis, Nonparametric Model for Background Subtraction Proc. Sixth European Conf. Computer Vision, pp. 751-761, 2000.
[5] K. Fukunaga and R.R. Hayes, “The Reduced Parzen Classifier,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 4, pp. 423-425, Apr. 1989.
[6] K. Fukunaga and J.M. Mantock, Nonparametric Data Reduction IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, pp. 115-118, 1984.
[7] M. Girolami, Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem Neural Computation, vol. 14, no. 3, pp. 669-688, MIT Press, 2002.
[8] T. Hastie and W. Stuetzle, Principal Curves J. Am. Statistical Assoc., vol. 84, no. 406, pp. 502-516, 1989.
[9] L. Holmström, The Error and the Computational Complexity of a Multivariate Binned Kernel Density Estimator J. Multivariate Analysis, vol. 72, no. 2, pp. 264-309, 2000.
[10] M. Gyssens,J. Paredaens,, and D. Van Gucht,“A graph-oriented object database model,” Proc. Ninth ACM Symp. Principles of Database Systems, pp. 417-424, Apr. 1990.
[11] A.J. Izenman, Recent Developments in Nonparametric Density Estimation J. Am. Statistical Assoc., vol. 86, pp. 205-224, 1991.
[12] B. Jeon and D.A. Landgrebe, Fast Parzen Density Estimation Using Clustering-Based Branch and Bound IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp 950-954, Sept. 1994.
[13] D. Kim, Least Squares Mixture Decomposition Estimation unpublished doctoral dissertation, Dept. of Statistics, Virginia Polytechnic Inst. and State Univ., 1995.
[14] T. Kohonen, Self-Organizing Maps. Springer-Verlag, 1995.
[15] C. Lambert, S. Harrington, C. Harvey, and A. Glodjo, Efficient Online Nonparametric Kernel Density Estimation Algorithmica, vol. 25, pp 37-57, 1999.
[16] E.L. Lehmann, Nonparametric Statistical Methods Based on Ranks. New York: McGraw-Hill, 1975.
[17] G. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000.
[18] P. Mitra, C.A. Murthy, and S.K. Pal, Density Based Multiscale Data Condensation IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 6, June 2002.
[19] S. Mukherjee and V. Vapnik, Support Vector Method for Multivariate Density Estimation CBCL Paper #170, AI Memo #1653, 1999.
[20] E. Parzen, On Estimation of a Probability Density Function and Mode Annals of Math. Statistics, vol. 33, pp. 1065-1076, 1962.
[21] C.E. Priebe and D.J. Marchette, Alternating Kernel and Mixture Density Estimates Computational Statistics and Data Analysis, vol. 35, pp. 43-65, 2000.
[22] S. Roberts, Extreme Value Statistics for Novelty Detection in Biomedical Signal Processing IEE Proc. Science, Technology, and Measurement, vol. 47, no. 6, pp. 363-367, 2000.
[23] S. Sain, Adaptive Kernel Density Estimation PhD thesis, Rice Univ., 1994.
[24] D.W. Scott, Remarks on Fitting and Interpreting Mixture Models Computing Science and Statistics, K. Berk and M. Pourahmadi, eds., vol. 31, pp. 104-109, 1999.
[25] D.W. Scott and S.J. Sheather, Kernel Density Estimation with Binned Data Comm. Statistics Theory and Methods, vol. 14, pp. 1353-1359, 1985.
[26] D.W. Scott and W.F. Szewczyk, From Kernels to Mixtures Technometrics, vol. 43, pp. 323-335,
[27] F. Sha, L. Saul, and D.D. Lee, Multiplicative Updates for Non-Negative Quadratic Programming in Support Vector Machines. Technical Report MS-CIS-02-19, Univ. of Pennsylvania, 2002.
[28] B.W. Silverman, Kernel Density Estimation Using the Fast Fourier Transform Applied Statistics, vol. 31, pp. 93-99, 1982.
[29] B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[30] B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, Estimating the Support of a High-Dimensional Distribution Neural Computation, vol. 13, pp. 1443-1471, 2001.
[31] B. Schölkopf, A. Smola, and K.R. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem Neural Computation, vol. 10, no. 5, pp. 1299-1219, 1998.
[32] D.M.J. Tax and R.P.W. Duin, Support Vector Data Description Pattern Recognition Letters, vol. 20, nos. 11-13, pp. 1191-1199, 1999.
[33] V.N. Vapnik, Statistical Learning Theory. New York: John Wiley and Sons, 1998.
[34] V. Vapnik and S. Mukherjee, Support Vector Method for Multivariate Density Estimation Advances in Neural Information Processing Systems, S. Solla, T. Leen, and K.-R. Müller, eds., MIT Press pp 659-665, 2000.
[35] J. Weston, A. Gammerman, M.O. Stitson, V. Vapnick, V. Vovk, and C. Watkins, Support Vector Density Estimation Advances in Kernel Methods, MIT Press, 1999.

Index Terms:
Kernel density estimation, Parzen window, data condensation, sparse representation.
Citation:
Mark Girolami, Chao He, "Probability Density Estimation from Optimally Condensed Data Samples," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1253-1264, Oct. 2003, doi:10.1109/TPAMI.2003.1233899
Usage of this product signifies your acceptance of the Terms of Use.