The Community for Technology Leaders
Subscribe
Issue No.07 - July (2009 vol.31)
pp: 1153-1164
Probal Chaudhuri , Indian Statistical Institute, Kolkata
Anil K. Ghosh , Indian Statistical Institute, Kolkata
Hannu Oja , University of Tampere, Finland
ABSTRACT
Parametric methods of classification assume specific parametric models for competing population densities (e.g., Gaussian population densities can lead to linear and quadratic discriminant analysis) and they work well when these model assumptions are valid. Violation in one or more of these parametric model assumptions often leads to a poor classifier. On the other hand, nonparametric classifiers (e.g., nearest-neighbor and kernel-based classifiers) are more flexible and free from parametric model assumptions. But, the statistical instability of these classifiers may lead to poor performance when we have small numbers of training sample observations. Nonparametric methods, however, do not use any parametric structure of population densities. Therefore, even when one has some additional information about population densities, that important information is not used to modify the nonparametric classification rule. This paper makes an attempt to overcome these limitations of parametric and nonparametric approaches and combines their strengths to develop some hybrid classification methods. We use some simulated examples and benchmark data sets to examine the performance of these hybrid discriminant analysis tools. Asymptotic results on their misclassification rates have been derived under appropriate regularity conditions.
INDEX TERMS
Bayes risk, bandwidth, kernel density estimation, LDA, misclassification rate, multiscale smoothing, nearest neighbor, QDA.
CITATION
Probal Chaudhuri, Anil K. Ghosh, Hannu Oja, "Classification Based on Hybridization of Parametric and Nonparametric Classifiers", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 7, pp. 1153-1164, July 2009, doi:10.1109/TPAMI.2008.149
REFERENCES
 [1] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Wadsworth and Brooks Press, 1984. [2] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp.123-140, 1996. [3] C. Bolance, M. Guillen, and J.P. Nielsen, “Kernel Density Estimation of Actuarial Loss Functions,” Insurance: Math. and Economics, vol. 32, pp. 19-36, 2003. [4] T. Buch-Larsen, J.P. Nielsen, M. Guillen, and C. Bolance, “Kernel Density Estimation for Heavy-Tailed Distributions Using the Champernowne Transformation,” Statistics, vol. 39, pp. 503-518, 2005. [5] P. Chaudhuri and J.S. Marron, “SiZer for Exploration of Structures in Curves,” J. Am. Statistical Assoc., vol. 94, pp. 807-823, 1999. [6] P. Chaudhuri and J.S. Marron, “Scale Space View of Curve Estimation,” Annals of Statistics, vol. 28, pp. 408-428, 2000. [7] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, pp. 21-27, 1967. [8] B.V. Dasarathy, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE CS, 1991. [9] R. Duda, P. Hart, and D.G. Stork, Pattern Classification. John Wiley & Sons, 2000. [10] R.A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, vol. 7, pp. 179-188, 1936. [11] E. Fix and J.L. Hodges Jr., “Discriminatory Analysis, Nonparametric Discrimination, Consistency Properties,” Report No. 4, Project 21-49-004, 1951. [12] J.H. Friedman, “Flexible Metric Nearest Neighbor Classification,” technical report, Dept. of Statistics, Stanford Univ., 1994. [13] J.H. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting (with Discussion),” Annals of Statistics, vol. 28, pp. 337-374, 2000. [14] K. Fukunaga and L.D. Hostetler, “Optimization of $k$ -Nearest Neighbor Density Estimates,” IEEE Trans. Information Theory, vol. 19, pp. 320-326, 1973. [15] A.K. Ghosh, P. Chaudhuri, and C.A. Murthy, “On Visualization and Aggregation of Nearest Neighbor Classifiers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1592-1602, Oct. 2005. [16] A.K. Ghosh, P. Chaudhuri, and D. Sengupta, “Classification Using Kernel Density Estimates: Multi-Scale Analysis and Visualization,” Technometrics, vol. 48, pp. 120-132, 2006. [17] A.K. Ghosh and S. Bose, “Feature Extraction for Classification Using Statistical Networks,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 21, pp. 1103-1126, 2007. [18] I. Glad, “Parametrically Guided Nonparametric Regression,” Scandinavian J. Statistics, vol. 25, pp. 649-668, 1998. [19] F. Godtliebsen, J.S. Marron, and P. Chaudhuri, “Significance in Scale Space for Bivariate Density Estimation,” J. Computational and Graphical Statistics, vol. 11, pp. 1-22, 2002. [20] D.J. Hand, Kernel Discriminant Analysis. John Wiley & Sons, 1982. [21] T. Hastie, R. Tibshirani, and A. Buja, “Flexible Discriminant Analysis,” J. Am. Statistical Assoc., vol. 89, pp. 1255-1270, 1994. [22] T. Hastie and R. Tibshirani, “Discriminant Adaptive Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607-616, June 1996. [23] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001. [24] N.L. Hjort and I. Glad, “Nonparametric Density Estimation with a Parametric Start,” Annals of Statistics, vol. 23, pp. 882-904, 1995. [25] N.L. Hjort and M.C. Jones, “Locally Parametric Nonparametric Density Estimation,” Annals of Statistics, vol. 24, pp. 1619-1647, 1996. [26] C.C. Holmes and N.M. Adams, “A Probabilistic Nearest Neighbor Method for Statistical Pattern Recognition,” J. Royal Statistical Soc., Series B, vol. 64, pp. 295-306, 2002. [27] C.C. Holmes and N.M. Adams, “Likelihood Inference in Nearest-Neighbor Classification Methods,” Biometrika, vol. 90, pp. 99-112, 2003. [28] F. Hoti and L. Holmstrom, “A Semiparametric Density Estimation Approach to Pattern Classification,” Pattern Recognition, vol. 37, pp. 409-419, 2004. [29] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis. Prentice Hall, 1992. [30] M.C. Jones, O. Linton, and J.P. Neilsen, “A Simple and Effective Bias Reduction Method for Density and Regression Estimation,” Biometrika, vol. 82, pp. 327-338, 1995. [31] P.A. Lachenbruch and M.R. Mickey, “Estimation of Error Rates in Discriminant Analysis,” Technometrics, vol. 10, pp. 1-11, 1968. [32] S.L. Lai, “Large Sample Properties of k-Nearest Neighbor Procedures,” PhD dissertation, Dept. of Math., Univ. of California, Los Angeles, 1977. [33] D.O. Loftsgaarden and C.P. Quesenberry, “A Nonparametric Estimate of a Multivariate Density Function,” Annals of Math. Statistics, vol. 36, pp. 1049-1051, 1965. [34] Y.P. Mack, “Local Properties of k-NN Regression Estimates,” SIAM J. Algebraic and Discrete Methods, vol. 2, pp. 311-323, 1981. [35] P.C. Mahalanobis, “On the Generalized Distance in Statistics,” Proc. Nat'l Inst. of Science, vol. 12, pp. 49-55, 1936. [36] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, 1992. [37] I. Olkin and C.H. Spiegelman, “A Semiparametric Approach to Density Estimation,” J. Am. Statistical Assoc., vol. 82, pp. 858-865, 1987. [38] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996. [39] R.E. Schapire, Y. Fruend, P. Bartlett, and W. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Annals of Statistics, vol. 26, pp. 1651-1686, 1998. [40] D.W. Scott, Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & Sons, 1992. [41] D.B. Shalak, “Prototype Selections for Composite Nearest Neighbor Classifiers,” PhD dissertation, Dept. of Computer Science, Univ. of Massachusetts, 1996. [42] B.W. Silverman, “Weak and Strong Uniform Consistency of the Kernel Estimate of a Density Function and Its Derivatives,” Annals of Statistics, vol. 6, pp. 177-184, 1978. [43] B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986. [44] V.N. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998. [45] M. Wand and M.C. Jones, Kernel Smoothing. Chapman and Hall, 1995.