This Article 
 Bibliographic References 
 Add to: 
Nearest Neighbors by Neighborhood Counting
June 2006 (vol. 28 no. 6)
pp. 942-953
Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard” and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.

[1] R.B. Ash and C. Doléans-Dade, Probability and Measure Theory. Academic Press, 2000.
[2] C.G. Atkeson, A.W. Moore, and S. Schaal, “Locally Weighted Learning,” Artificial Intelligence Rev., vol. 11, nos. 1-5, pp. 11-73, 1997.
[3] T. Baily and A.K. Jain, “A Note on Distance-Weighted $k$ -Nearest Neighbor Rules,” IEEE Trans. Systems, Man, and Cybernetics, vol. 8, no. 4, pp. 311-313, 1978.
[4] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases, 1998.
[5] E. Blanzieri, and F. Ricci, “Probability Based Metrics for Nearest Neighbor Classification and Case-Based Reasoning,” Lecture Notes in Computer Science, vol. 1650, pp. 14-29, 1999.
[6] S. Cost and S. Salzberg, “A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features,” Machine Learning, vol. 10, pp. 57-78, 1993.
[7] T.M. Cover and P.E. Hart, “Nearest Neighbour Pattern Classification,” IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, 1967.
[8] Nearest Neighbor(NN) Norms: NN Pattern Classification Techniques, B.V. Dasarathy, ed. Los Alamitos, Calif.: IEEE CS Press, 1991.
[9] T. Denoeux, “A k-Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory,” IEEE Trans. Systems, Man, and Cybernetics, vol. 25, pp. 804-813, 1995.
[10] P. Domingos, “Rule Induction and Instance-Based Learning: A Unified Approach,” Proc. 1995 Int'l Joint Conf. Artificial Intelligence, 1995.
[11] S.A. Dudani, “The Distance-Weighted k-Nearest-Neighbor Rule,” IEEE Trans. Systems, Man, and Cybernetics, vol. 6, pp. 325-327, 1976.
[12] C. Elkan, “ Results of the KDD '99 Classifier Learning Contest,” Sept. 1999, .
[13] E. Fix and J.L. Hodges, “Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties,” Technical Report TR4, US Air Force School of Aviation Medicine, Randolph Field, Tex., 1951.
[14] Wikimedia Foundation, Wikipedia, The Free Encyclopedia, http:/, 2006.
[15] P. Gardenfors, Conceptual Spaces: The Geometry of Thought. The MIT Press, 2000.
[16] D. Hand, H. Mannila, and P. Smyth, Principles of Data Mining. MIT Press, 2001.
[17] H. Hayashi, J. Sese, and S. Morishita, “Optimization of Nearest Neighborhood Parameters for KDD-2001 Cup ‘the Genomics Challenge’,” technical report, Univ. of Tokyo, 2001, WS/PDFfilesMorishita.pdf.
[18] T.M. Mitchell, Machine Learning. McGraw-Hill Companies, Inc., 1997.
[19] R.L. Morin and D.E. Raeside, “A Reappraisal of Distance-Weighted $k$ -Nearest Neighbor Classification for Pattern Recognition with Missing Data,” IEEE Trans. Systems, Man, and Cybernetcs, vol. 11, no. 3, pp. 241-243, 1981.
[20] H. Osborne and D. Bridge, “Models of Similarity for Case-Based Reasoning,” Proc. Interdisciplinary Workshop Similarity and Categorisation, pp. 173-179, 1997.
[21] J. Rachlin, S. Kasif, S. Salzberg, and D.W. Aha, “Towards a Better Understanding of Memory-Based and Bayesian Classifiers,” Proc. 11th Int'l Machine Learning Conf., pp. 242-250, 1994.
[22] S. Salzberg, “A Nearest Hyperrectangle Learning Method,” Machine Learning, vol. 6, pp. 251-276, 1991.
[23] G.W. Snedecor and W.G. Cochran, Statistical Methods. Ames, Iowa: Iowa State Univ. Press, 2002.
[24] C. Stanfill and D. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, pp. 1213-1229, 1986.
[25] S.S. Stevens, Mathematics, Measurement, and Psychophysics (Handbook of Experimental Psychology). Wiley, 1951.
[26] G. Towell, J. Shavlik, and M. Noordewier, “Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks,” Proc. Eighth Nat'l Conf. Artificial Intelligence, pp. 861-866, 1990.
[27] H. Wang, I. Düntsch, G. Gediga, and A. Skowron, “Hyperrelations in Version Space,” Int'l J. Approximate Reasoning, vol. 36, no. 3, pp. 223-241, 2004.
[28] D. Widdows, Geometry and Meaning. Univ. of Chicago Press, 2004.
[29] D.R. Wilson and T.R. Martinez, “Improved Heterogeneous Distance Functions,” J. Artificial Intelligence Research, vol. 6, pp. 1-34, 1997.

Index Terms:
Pattern recognition, machine learning, nearest neighbors, distance, similarity, neighborhood counting measure.
Hui Wang, "Nearest Neighbors by Neighborhood Counting," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp. 942-953, June 2006, doi:10.1109/TPAMI.2006.126
Usage of this product signifies your acceptance of the Terms of Use.