The Community for Technology Leaders
Green Image
ABSTRACT
A novel similarity, neighborhood counting measure, was recently proposed which counts the number of neighborhoods of a pair of data points. This similarity can handle numerical and categorical attributes in a conceptually uniform way, can be calculated efficiently through a simple formula, and gives good performance when tested in the framework of k-nearest neighbor classifier. In particular it consistently outperforms a combination of the classical Euclidean distance and Hamming distance. This measure was also shown to be related to a probability formalism, G probability, which is induced from a target probability function P. It was however unclear how G is related to P, especially for classification. Therefore it was not possible to explain some characteristic features of the neighborhood counting measure. In this paper we show that G is a linear function of P, and G-based Bayes classification is equivalent to P-based Bayes classification. We also show that the k-nearest neighbor classifier, when weighted by the neighborhood counting measure, is in fact an approximation of the G-based Bayes classifier, and furthermore, the P-based Bayes classifier. Additionally we show that the neighborhood counting measure remains unchanged when binary attributes are treated as categorical or numerical data. This is a feature that is not shared by other distance measures, to the best of our knowledge. This study provides a theoretical insight into the neighborhood counting measure.
INDEX TERMS
Decision support, Clustering, classification, and association rules
CITATION
Fionn Murtagh, Hui Wang, "A Study of the Neighborhood Counting Similarity", IEEE Transactions on Knowledge & Data Engineering, vol. 20, no. , pp. 449-461, April 2008, doi:10.1109/TKDE.2007.190721
93 ms
(Ver )