Publication 2006 Issue No. 7 - July Abstract - Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error
Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error
July 2006 (vol. 28 no. 7)
pp. 1100-1110
 ASCII Text x Roberto Paredes, Enrique Vidal, "Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1100-1110, July, 2006.
 BibTex x @article{ 10.1109/TPAMI.2006.145,author = {Roberto Paredes and Enrique Vidal},title = {Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {28},number = {7},issn = {0162-8828},year = {2006},pages = {1100-1110},doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.145},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - Learning Weighted Metrics to Minimize Nearest-Neighbor Classification ErrorIS - 7SN - 0162-8828SP1100EP1110EPD - 1100-1110A1 - Roberto Paredes, A1 - Enrique Vidal, PY - 2006KW - Weighted distancesKW - nearest neighborKW - leaving-one-outKW - error minimizationKW - gradient descent.VL - 28JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -
Enrique Vidal, IEEE Computer Society
In order to optimize the accuracy of the Nearest-Neighbor classification rule, a weighted distance is proposed, along with algorithms to automatically learn the corresponding weights. These weights may be specific for each class and feature, for each individual prototype, or for both. The learning algorithms are derived by (approximately) minimizing the Leaving-One-Out classification error of the given training set. The proposed approach is assessed through a series of experiments with uci/statlog corpora, as well as with a more specific task of text classification which entails very sparse data representation and huge dimensionality. In all these experiments, the proposed approach shows a uniformly good behavior, with results comparable to or better than state-of-the-art results published with the same data so far.

[1] C.J. Merz, P.M. Murphy, and D.W. Aha, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, http://www.ics.uci.edu/~mlearnMLRepository.html . 1997.
[2] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, “Learning to Extract Symbolic Knowledge from the World Wide Web,” Proc. 15th Nat'l Conf. Artificial Intelligence, pp. 509-516, 1998.
[3] D. de Ridder, O. Kouropteva, O. Okun, M. Pietikäinen, and R.P.W. Duin, “Supervised Locally Linear Embedding,” Proc. Joint Conf. Artificial Neural Networks and Neural Information Processing, 2003.
[4] D. de Ridder, M. Loog, and M.J.T. Reinders, “Local Fisher Embedding,” Proc. 17th Int'l Conf. Pattern Recognition, vol. 2, pp. 295-298, 2004.
[5] P. Devijver and J. Kittler, Pattern Recognition. A Statistical Approach. Prentice Hall, 1982.
[6] L. Devroye, L. Gyorfi, A. Krzyzak, and G. Lugosi, “On the Strong Universal Consistency of the Nearest Neighbor Regression Function Estimates,” Annals of Statistics, vol. 22, pp. 1371-1385, 1994.
[7] L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.
[8] C. Domeniconi, J. Peng, and D. Gunopulos, “Locally Adaptive Metric Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1281-1285, Sept. 2002.
[9] F. Ferri, J. Albert, and E. Vidal, “Considerations about Sample-Size Sensitivity of a Family of Edited Nearest-Neighbor Rules,” IEEE Trans. Systems, Man, and Cybernetics, vol. 29, no. 4, pp. 667-672, Aug. 1999.
[10] T. Hastie and R. Tibshirani, “Discriminant Adaptive Nearest Neighbor Classification and Regression,” Advances in Neural Information Processing Systems, vol. 8, pp. 409-415, 1996.
[11] N. Howe and C. Cardie, “Examining Locally Varying Weights for Nearest Neighbor Algorithms,” Proc. Second Int'l Conf. Case-Based Reasoning, pp. 455-466, 1997.
[12] R. Kohavi, P. Langley, and Y. Yung, “The Utility of Feature Weighting in Nearest-Neighbor Algorithms,” Proc. Ninth European Conf. Machine Learning, 1997.
[13] I. Kononenko, “Estimating Attributes: Analysis and Extensions of RELIEF,” technical report, Faculty of Electrical Eng. and Computer Science, Univ. of Ljubjana, 1993.
[14] J. Koplowitz and T. Brown, “On the Relation of the Performance to Editing in Nearest Neighbor Rules,” Pattern Recognition, vol. 13, no. 3, pp. 251-255, 1981.
[15] M. Loog and R.P.W. Duin, “Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp. 732-739, June 2004.
[16] K. Nigam, J. Lafferty, and A. McCallum, “Using Maximum Entropy for Text Classification,” Proc. IJCAI-99 Workshop Machine Learning for Information Filtering, pp. 61-67, 1999.
[17] R. Paredes, “Técnicas para la Mejora de la Clasificación por el Vecino Más Cercano,” PhD thesis, Dept. de Sistemas Informátics y Computación, Univ. Politécnica de València, Spain, 2003.
[18] R. Paredes and E. Vidal, “Weighting Prototypes. A New Editing Approach,” Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 25-28, Sept. 2000.
[19] R. Paredes, A. Sanchis, E. Vidal, and A. Juan, “Utterance Verification Using an Optimized $k{\hbox{-}}\rm Nearest$ Neighbor Classifier,” Proc. Eighth European Conf. Speech Comm. and Technology, 2003.
[20] R. Paredes and E. Vidal, “A Class-Dependent Weighted Dissimilarity Measure for Nearest Neighbor Classification Problems,” Pattern Recognition Letters, vol. 21, pp. 1027-1036, Nov. 2000.
[21] R. Paredes and E. Vidal, “Learning Prototypes and Distances (LPD). A Prototype Reduction Technique Based on Nearest Neighbor Error Minimization,” Proc. 17th Int'l Conf. Pattern Recognition, vol. 3, pp. 442-445, 2004.
[22] R. Paredes and E. Vidal, “Learning Weighted Metrics to Minimize Nearest-Neighbor Error Estimation,” technical report, Dept. de Sistemas Informáticos y Computación, Univ. Politécnica de Valencia, Spain, 2004.
[23] R. Paredes and E. Vidal, “Learning Prototypes and Distances: A Prototype Reduction Technique Based on Nearest Neighbor Error Minimization,” Pattern Recognition, vol. 39, no. 2, pp. 180-188, 2006.
[24] R. Paredes, E. Vidal, and D. Keysers, “An Evaluation of the WPE Algorithm Using Tangent Distance,” Proc. Int'l Conf. Pattern Recognition, pp. 48-51, 2002.
[25] J. Peng, D.R. Heisterkamp, and H. Dai, “Adaptive Quasiconformal Kernel Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, May 2004.
[26] C. Penrod and T. Wagner, “Another Look at the Edited Nearest Neighbor Rule,” IEEE Trans. Systems, Man, and Cybernetics, vol. 7, pp. 92-94, 1977.
[27] S. Raudys and A. Jain, “Small Sample Effects in Statistical Pattern Recognition: Recommendations for Practitioners,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 252-264, Mar. 1991.
[28] F. Ricci and P. Avesani, “Data Compression and Local Metrics for Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 4, pp. 380-384, Apr. 1999.
[29] L.K. Saul and S.T. Roweis, “Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds,” J. Machine Learning Research, vol. 4, pp. 119-155, 2003.
[30] R.E. Schapire, Y. Freund, P. Barlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998.
[31] R. Short and K. Fukunaga, “A New Nearest Neighbor Distance Measure,” Proc. Fifth IEEE Int'l Conf. Pattern Recognition, pp. 81-86, 1980.
[32] C. Stanfill and D. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, pp. 1213-1228, 1986.
[33] Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor, eds, Ellis Horwood, 1994, data sets available from http://www.liacc.up.pt/ML/statlogdatasets.html .
[34] I. Tomek, “An Experiment with the Edited Nearest Neighbor Rule,” IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 2, pp. 121-126, 1976.
[35] D. Vilar, H. Ney, A. Juan, and E. Vidal, “Effect of Feature Smoothing Methods in Text Classification Tasks,” Proc. Fourth Int'l Workshop Pattern Recognition in Information Systems, pp. 108-117, 2004.
[36] D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Trans. Systems, Man, and Cybernetics, vol. 2, pp. 408-421, May/June 1972.
[37] D. Wilson and T.R. Martinez, “Value Difference Metrics for Continously Valued Attributes,” Proc. Nat'l Conf. Artificial Intelligence, pp. 11-14, 1996.
[38] Y. Yang and J.O. Pederson, “Feature Selection in Statistical Learning of Text Categorization,” Machine Learning: Proc. 14th Int'l Conf. Machine Learning, pp. 412-420, 1997.

Index Terms:
Weighted distances, nearest neighbor, leaving-one-out, error minimization, gradient descent.
Citation:
Roberto Paredes, Enrique Vidal, "Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1100-1110, July 2006, doi:10.1109/TPAMI.2006.145