Issue No. 11 - November (2007 vol. 19)
This work has two main objectives, namely, to introduce a novel algorithm, called the Fast Condensed Nearest Neighbor (FCNN) rule, for computing a training set consistent subset for the nearest neighbor decision rule, and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst case time complexity is quadratic but often with a small constant pre-factor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangular inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior, and often in terms of the size of the model, while it guaranteed the same prediction accuracy. Furthermore, it was three order of magnitude faster than hybrid instance-based learning algorithms on the MNIST and MIT Face databases and computed a model of accuracy comparable to that of methods incorporating a noise filtering pass.
Clustering, classification, and association rules, Data mining
F. Angiulli, "Fast Nearest Neighbor Condensation for Large Data Sets Classification," in IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. , pp. 1450-1464, 2007.