This Article 
 Bibliographic References 
 Add to: 
On Visualization and Aggregation of Nearest Neighbor Classifiers
October 2005 (vol. 27 no. 10)
pp. 1592-1602
Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k.

[1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, Design and Analysis of Computer Algorithms. London: Addison-Wesley, 1974.
[2] E. Alpaydin, “Voting over Multiple Condensed Nearest Neighbor,” Artifical Intelligence Rev., vol. 11, pp. 115-132, 1997.
[3] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Monterey, Calif.: Wadsworth & Brooks, 1984.
[4] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[5] L. Breiman, “Arcing Classifiers (with Discussion),” Annals of Statistics, vol. 26, pp. 801-849, 1998.
[6] P. Chaudhuri and J.S. Marron, “SiZer for Exploration of Structures in Curves,” J. Am. Statistics Assoc., vol. 94, pp. 807-823, 1999.
[7] C.A. Cooley and S.N. MacEachern, “Classification via Kernel Product Estimators,” Biometrika, vol. 85, pp. 823-833, 1998.
[8] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, pp. 21-27, 1968.
[9] Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, B.V. Dasarathy, ed. Washington: IEEE CS Press, 1991.
[10] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. London: Prentice Hall, 1982.
[11] R. Duda, P. Hart, and D.G. Stork, Pattern Classification. New York: Wiley, 2000.
[12] E. Fix and J.L. Hodges, “Discriminatory Analysis— Nonparametric Discrimination: Consistency Properties,” Project 21-49-004, Report 4, US Air Force School of Aviation Medicine, Randolph Field, pp. 261-279, 1951.
[13] J.H. Friedman, “Flexible Metric Nearest Neighbor Classification,” technical report, Dept. of Statistics, Stanford Univ., 1996.
[14] J.H. Friedman, “On Bias, Variance, 0-1 Loss, and the Curse of Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, pp. 55-77, 1997.
[15] J.H. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting (with Discussion),” Annals of Statistics, vol. 28, pp. 337-407, 2000.
[16] K. Fukunaga and L.D. Hostetler, “Optimization of k-Nearest Neighbor Density Estimates,” IEEE Trans. Information Theory, vol. 19, pp. 320-326, 1973.
[17] K. Fukunaga, Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990.
[18] A.K. Ghosh, P. Chaudhuri, and D. Sengupta, “Multi-Scale Kernel Discriminant Analysis,” Proc. Fifth Int'l Conf. Advances in Pattern Recognition, D.P. Mukherjee and S. Pal, eds., pp. 89-93, 2003.
[19] A.K. Ghosh, P. Chaudhuri, and D. Sengupta, “Classification Using Kernel Density Estimates: Multi-Scale Analysis and Visualization,” Technometrics, pending publication.
[20] F. Godtliebsen, J.S. Marron, and P. Chaudhuri, “Significance in Scale Space for Bivariate Density Estimation,” J. Computational and Graphical Statistics, vol. 11, pp. 1-22, 2002.
[21] P.E. Hart, “The Condensed Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. 14, pp. 515-516, 1968.
[22] T. Hastie and R. Tibshirani, “Discriminant Adaptive Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, pp. 607-616, 1996.
[23] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag, 2001.
[24] T.K. Ho, J.J. Hull, and S.N. Srihari, “Decision Combination in Multiple Classifier Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, pp. 66-75, 1994.
[25] C.C. Holmes and N.M. Adams, “A Probabilistic Nearest Neighbor Method for Statistical Pattern Recognition,” J. Royal Statistical Soc., B, vol. 64, pp. 295-306, 2002.
[26] C.C. Holmes and N.M. Adams, “Likelihood Inference in Nearest-Neighbor Classification Methods,” Biometrika, vol. 90, pp. 99-112, 2003.
[27] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis. Prentice Hall, 1992.
[28] P.A. Lachenbruch and M.R. Mickey, “Estimation of Error Rates in Discriminant Analysis,” Technometrics, vol. 10, pp. 1-11, 1968.
[29] D.O. Loftsgaarden and C.P. Quesenberry, “A Nonparametric Estimate of a Multivariate Density Function,” Ann. Math. Statistics, vol. 36, pp. 1049-1051, 1965.
[30] P.C. Mahalanobis, “On the Generalized Distance in Statistics,” Proc. Nat'l Academy of Sciences, India, vol. 12, pp. 49-55, 1936.
[31] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley, 1992.
[32] P. Mitra, C.A. Murthy, and S.K. Pal, “Density Based Multiscale Data Condensation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 734-747, 2002.
[33] D. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,” J. Artificial Intelligence Research, vol. 11, pp. 169-198, 1999.
[34] M. Paik and Y. Yang, “Combining Nearest Neighbor Classifiers versus Cross-Validation Selection,” Statistical Applications in Genetics and Molecular Biology, vol. 3, 2004.
[35] S.K. Pal, S. Bandopadhyay, and C.A. Murthy, “Genetic Algorithms for Generation of Class Boundaries,” IEEE Trans. Systems, Man, and Cybernetics, vol. 28, pp. 816-828, 1998.
[36] G.E. Peterson and H.L. Barney, “Control Methods Used in a Study of Vowels,” J. Acoustical Soc. Am., vol. 24, pp. 175-185, 1952.
[37] B.D. Ripley, Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.
[38] R.E. Schapire, Y. Fruend, P. Bartlett, and W. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Ann. Statistics, vol. 26, pp. 1651-1686, 1998.
[39] D.B. Shalak, “Prototype Selections for Composite Nearest Neighbor Classifiers,” PhD Thesis, Dept. of Computer Science, Univ. of Massachusetts, 1996.
[40] M. Stone, “Cross Validation: A Review,” Mathematische Operationsforschung und Statistik, Series Statistics, vol. 9, pp. 127-139, 1977.

Index Terms:
Index Terms- Bayesian strength function, misclassification rates, multiscale visualization, neighborhood parameter, posterior probability, prior distribution, weighted averaging.
Anil K. Ghosh, Probal Chaudhuri, C.A. Murthy, "On Visualization and Aggregation of Nearest Neighbor Classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1592-1602, Oct. 2005, doi:10.1109/TPAMI.2005.204
Usage of this product signifies your acceptance of the Terms of Use.