This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Class Conditional Nearest Neighbor for Large Margin Instance Selection
February 2010 (vol. 32 no. 2)
pp. 364-370
Elena Marchiori, Radboud University, Nijmegen
This paper presents a relational framework for studying properties of labeled data points related to proximity and labeling information in order to improve the performance of the 1NN rule. Specifically, the class conditional nearest neighbor (ccnn) relation over pairs of points in a labeled training set is introduced. For a given class label c, this relation associates to each point a its nearest neighbor computed among only those points with class label c (excluded a). A characterization of ccnn in terms of two graphs is given. These graphs are used for defining a novel scoring function over instances by means of an information-theoretic divergence measure applied to the degree distributions of these graphs. The scoring function is employed to develop an effective large margin instance selection method, which is empirically demonstrated to improve storage and accuracy performance of the 1NN rule on artificial and real-life data sets.

[1] D.W. Aha, D. Kibler, and M.K. Albert, “Instance-Based Learning Algorithms,” Machine Learning, vol. 6, pp. 37-66, 1991.
[2] F. Angiulli, “Fast Nearest Neighbor Condensation for Large Data Sets Classification,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 11, pp.1450-1464, Nov. 2007.
[3] F. Angiulli and G. Folino, “Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 12, pp. 1593-1606, Dec. 2007.
[4] V. Barnett, “The Ordering of Multivariate Data,” J. Royal Statistical Soc., vol. 139, no. 3, pp. 318-355, 1976.
[5] P.L. Bartlett, “For Valid Generalization, the Size of the Weights Is More Important than the Size of the Network,” Advances in Neural Information Processing Systems 9, pp. 134-140, The MIT Press, 1997.
[6] B. Hammer, M. Strickert, and T. Villmann, “On the Generalization Ability of GRLVQ Networks,” Neural Processing Letters, vol. 21, no. 2, pp. 109-120, 2005.
[7] H. Brighton and C. Mellish, “On the Consistency of Information Filters for Lazy Learning Algorithms,” Proc. Third European Conf. Principles of Data Mining and Knowledge Discovery, pp. 283-288, 1999.
[8] H. Brighton and C. Mellish, “Advances in Instance Selection for Instance-Based Learning Algorithms,” Data Mining and Knowledge Discovery, vol. 6, pp. 153-172, 2002.
[9] R.M. Cameron-Jones, “Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing,” Proc. Eighth Australian Joint Conf. Artificial Intelligence, pp. 99-106, 1995.
[10] O. Chapelle and A. Zien, “Semi-Supervised Classification by Low Density Separation,” Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, pp.57-64, 2005.
[11] C. Cortes and V. Vapnik, “Support Vector Networks,” Machine Learning, vol. 20, pp. 273-297, 1995.
[12] T. Cover and P. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.
[13] K. Crammer, R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin Analysis of the LVQ Algorithm,” Proc. Conf. Neural Information Processing Systems, pp. 462-469, 2002.
[14] B.V. Dasarathy, “Minimal Consistent Set (mcs) Identification for Optimal Nearest Neighbor Decision Systems Design,” IEEE Trans. Systems, Man, and Cybernetics, vol. 24, no. 3, pp. 511-517, Mar. 1994.
[15] B.V. Dasarathy, “Fuzzy Understanding of Neighborhoods with Nearest Unlike Neighbor Sets,” Proc. SPIE, pp. 34-43, 1995.
[16] B.V. Dasarathy, “Nearest Unlike Neighbor (NUN): An Aid to Decision Confidence Estimation,” Optical Eng., vol. 34, no. 9, pp. 2785-2792, 1995.
[17] J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[18] C. Domeniconi, J. Peng, and D. Gunopulos, “Locally Adaptive Metric Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1281-1285, Sept. 2002.
[19] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[20] S. García, J. Ramón Cano, and F. Herrera, “A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling up Approach,” Pattern Recognition, vol. 41, no. 8, pp. 2693-2709, 2008.
[21] R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin Based Feature Selection—Theory and Algorithms,” Proc. Int'l Conf. Machine Learning, 2004.
[22] P.J. Grother, G.T. Candela, and J.L. Blue, “Fast Implementation of Nearest Neighbor Classifiers,” Pattern Recognition, vol. 30, pp. 459-465, 1997.
[23] P.E. Hart, “The Condensed Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. 14, no. 3, pp. 515-516, May 1968.
[24] N. Jankowski and M. Grochowski, “Comparison of Instances Selection Algorithms II. Results and Comments,” Artificial Intelligence and Soft Computing, pp. 580-585, Springer, 2004.
[25] J.M. Kleinberg, “Two Algorithms for Nearest-Neighbor Search in High Dimensions,” Proc. 29th ACM Symp. Theory of Computing, pp. 599-608, 1997.
[26] K. Krishna, M.A.L. Thathachar, and K.R. Ramakrishnan, “Voronoi Networks and Their Probability of Misclassification,” IEEE Trans. Neural Networks, vol. 11, no. 6, pp. 1361-1372, Nov. 2000.
[27] L.I. Kuncheva and L.C. Jain, “Nearest Neighbor Classifier: Simultaneous Editing and Feature Selection,” Pattern Recognition Letters, vol. 20, nos. 11-13, pp. 1149-1156, 1999.
[28] J. Lin, “Divergence Measures Based on the Shannon Entropy,” IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, Jan. 1991.
[29] E. Marchiori, “Hit Miss Networks with Applications to Instance Selection,” J. Machine Learning Research, vol. 9, pp. 997-1017, 2008.
[30] R.B. McCammon, “Map Pattern Reconstruction from Sample Data; Mississippi Delta Region of Southeast Louisiana,” J. Sedimentary Petrology, vol. 42, no. 2, pp. 422-424, 1972.
[31] R. Paredes and E. Vidal, “Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1100-1110, July 2006.
[32] J. Peng, D.R. Heisterkamp, and H.K. Dai, “Adaptive Quasiconformal Kernel Nearest Neighbor Classification,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 656-661, May 2004.
[33] E. Pkalska, R.P.W. Duin, and P. Paclík, “Prototype Selection for Dissimilarity-Based Classifiers,” Pattern Recognition, vol. 39, no. 2, pp. 189-208, 2006.
[34] G. Rätsch, T. Onoda, and K.-R. Müller, “Soft Margins for AdaBoost,” Machine Learning, vol. 42, no. 3, pp. 287-320, 2001.
[35] G.L. Ritter, H.B. Woodruff, S.R. Lowry, and T.L. Isenhour, “An Algorithm for a Selective Nearest Neighbor Decision Rule,” IEEE Trans. Information Theory, vol. 21, no. 6, pp. 665-669, Nov. 1975.
[36] J.S. Sánchez, R.A. Mollineda, and J.M. Sotoca, “An Analysis of How Training Data Complexity Affects the Nearest Neighbor Classifiers,” Pattern Analysis and Applications, vol. 10, no. 3, pp. 189-201, 2007.
[37] R.E. Schapire, Y. Freund, P.L. Bartlett, and W.S. Lee, “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods,” Proc. 14th Int'l Conf. Machine Learning, pp. 322-330, 1997.
[38] M.A. Tahir, A. Bouridane, and F. Kurugollu, “Simultaneous Feature Selection and Feature Weighting Using Hybrid Tabu Search/K-Nearest Neighbor Classifier,” Pattern Recognition Letters, vol. 28, no. 4, pp. 438-446, 2007.
[39] I. Tomek, “An Experiment with the Edited Nearest-Neighbor Rule,” IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 6, pp. 448-452, 1976.
[40] G.T. Toussaint, “Proximity Graphs for Nearest Neighbor Decision Rules: Recent Progress,” Proc. Interface-2002, 34th Symp. Computing and Statistics, pp. 83-106, 2002.
[41] V. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[42] J. Wang, P. Neskovic, and L.N. Cooper, “Improving Nearest Neighbor Rule with a Simple Adaptive Distance Measure,” Pattern Recognition Letters, vol. 28, no. 2, pp. 207-213, 2007.
[43] K.Q. Weinberger, J. Blitzer, and L.K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” Proc. Conf. Neural Information Processing Systems, 2006.
[44] D.L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Trans. Systems, Man, and Cybernetics, vol. 2, no. 3, pp.408-420, July 1972.
[45] D.R. Wilson and T.R. Martinez, “Instance Pruning Techniques,” Proc. 14th Int'l Conf. Machine Learning, pp. 403-411, 1997.
[46] D.R. Wilson and T.R. Martinez, “Reduction Techniques for Instance-Based Learning Algorithms,” Machine Learning, vol. 38, no. 3, pp. 257-286, 2000.

Index Terms:
Computing methodologies, artificial intelligence, learning, heuristics design, machine learning.
Citation:
Elena Marchiori, "Class Conditional Nearest Neighbor for Large Margin Instance Selection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 364-370, Feb. 2010, doi:10.1109/TPAMI.2009.164
Usage of this product signifies your acceptance of the Terms of Use.