This Article 
 Bibliographic References 
 Add to: 
Data Compression and Local Metrics for Nearest Neighbor Classification
April 1999 (vol. 21 no. 4)
pp. 380-384

Abstract—A local distance measure for the nearest neighbor classification rule is shown to achieve high compression rates and high accuracy on real data sets. In the approach proposed here, first, a set of prototypes is extracted during training and, then, a feedback learning algorithm is used to optimize the metric. Even if the prototypes are randomly selected, the proposed metric outperforms, both in compression rate and accuracy, common editing procedures like ICA, RNN, and PNN. Finally, when accuracy is the major concern, we show how compression can be traded for accuracy by exploiting voting techniques. That indicates how voting can be successfully integrated with instance-bases approaches, overcoming previous negative results.

[1] D. Aha, "Tolerating Noisy, Irrelevant, and Novel Attributes in Instance-Based Learning Algorithms," Int'l J. Man-Machine Studies, vol. 36, no. 2, 1992, pp. 267-287.
[2] D.W. Aha and R.L. Goldstone, "Concept Learning and Flexible Weighting," Proc. 14th Ann. Conf. Cognitive Science Soc., pp. 534-539,Bloomington, In., 1992
[3] E. Alpaydim, "Voting Over Multiple Condensed Nearest Neighbors," AI Review J., vol. 11, pp. 115-132, 1997.
[4] S.D. Bay, "Nearest Neighbor Classification From Multiple Feature Subsets," Machine Learning, 1998.
[5] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[6] L. Breiman, "Bias, Variance, and Arcing Classifiers," Technical Report 460, Univ. of California, Berkeley, Apr. 1996.
[7] Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, B.V. Dasarathy, ed. Los Alamitos, Calif.: IEEE CS Press, 1991.
[8] T. Hastie and R. Tibshirani, “Discriminant Adaptive Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607-615, June 1996.
[9] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[10] T. Kohonen, "The Self-Organizing Map," Proc. IEEE, vol. 78, no. 9, pp. 1,464-1,480, Sept. 1990.
[11] D.L. Medin and E.J. Shoben, "Context and Structure in Conceptual Combination," Cognitive Psychology, vol. 20, pp. 158-190, 1988.
[12] C.J. Merz and P.M. Murphy, UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, Univ. of California, Irvine, 1996.
[13] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[14] F. Ricci and D.W. Aha, "Error-Correcting Output Codes for Local Learners," Proc. 10th European Conf. Machine Learning,Chemnitz, Germany, Apr. 1998.
[15] F. Ricci and P. Avesani, "Nearest Neighbor Classification With a Local Asymmetrically Weighted Metric," Technical Report no. 9602-01, IRST, Feb. 1996.
[16] D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization.New York: John Wiley, 1992.
[17] R. Short and K. Fukanaga, "The Optimal Distance Measure for Nearest Neighbor Classification," IEEE Trans. Information Theory, vol. 27, pp. 622-627, 1981.
[18] D.B. Skalak, "Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms," Proc. 11th Int'l Machine Learning Conf., pp. 293-301,New Brunswick, N.J., 1994.
[19] C. Stanfill and D. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, pp. 1213-1228, 1986.
[20] D. Wettschereck, D. Aha, and T. Mohri, "A Review and Empirical Comparison of Feature Weighting Methods for a Class of Lazy Learning Algorithms," Artificial Intelligence Rev., vol. 11, no. 1-5, Feb. 1997, pp. 237-314.
[21] D.R. Wilson and T.R. Martinez, "Improved Heterogeneous Distance Functions," J. Artificial Intelligence Research, vol. 11, pp. 1-34, 1997.

Index Terms:
Nearest neighbor, data compression, machine learning, local metric, multiple models, case-based reasoning.
Francesco Ricci, Paolo Avesani, "Data Compression and Local Metrics for Nearest Neighbor Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 380-384, April 1999, doi:10.1109/34.761268
Usage of this product signifies your acceptance of the Terms of Use.