The Community for Technology Leaders
RSS Icon
Issue No.09 - September (2010 vol.22)
pp: 1274-1285
Eric K. Garcia , University of Washington, Seattle
Sergey Feldman , University of Washington, Seattle
Maya R. Gupta , University of Washington, Seattle
Santosh Srivastava , Fred Hutchinson Cancer Research Center, Seattle
Local classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy because the neighborhood size k (or other locality parameter) is usually chosen by cross validation on the training set, which can require significant preprocessing and risks overfitting. We propose a simple alternative to cross validation of the neighborhood size that requires no preprocessing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods. We show that this forms an expected estimated posterior that minimizes the expected Bregman loss with respect to the uncertainty about the neighborhood choice. We analyze this approach for six standard and state-of-the-art local classifiers, including discriminative adaptive metric kNN (DANN), a local support vector machine (SVM-KNN), hyperplane distance nearest neighbor (HKNN), and a new local Bayesian quadratic discriminant analysis (local BDA). The empirical effectiveness of this technique versus cross validation is confirmed with experiments on seven benchmark data sets, showing that similar classification performance can be attained without any training.
Lazy learning, Bayesian estimation, cross validation, local learning, quadratic discriminant analysis.
Eric K. Garcia, Sergey Feldman, Maya R. Gupta, Santosh Srivastava, "Completely Lazy Learning", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 9, pp. 1274-1285, September 2010, doi:10.1109/TKDE.2009.159
[1] D. Aha, Lazy Learning. Springer, 1997.
[2] H. Zhang, A.C. Berg, M. Maire, and J. Malik, "SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 2126-2136, 2006.
[3] M.R. Gupta, R. Gray, and R. Olshen, "Nonparametric Supervised Learning by Linear Interpolation with Maximum Entropy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 766-781, May 2006.
[4] W. Lam, C. Keung, and D. Liu, "Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1075-1090, Aug. 2002.
[5] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer-Verlag, 2001.
[6] C. Böhm, S. Berchtold, and D. Keim, "Searching in High-Dimensional Spaces- Index Structures for Improving the Performance of Multimedia Databases," ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, Sept. 2001.
[7] D. Cantone, A. Ferro, A. Pulvirenti, D. Reforgiato, and D. Shasha, "Antipole Indexing to Support Range Search and K-Nearest Neighbor on Metric Spaces," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 4, pp. 535-550, Apr. 2005.
[8] T. Liu, A.W. Moore, and A. Gray, "New Algorithms for Efficient High-Dimensional Nonparametric Classification," J. Machine Learning Research, vol. 7, pp. 1135-1158, 2006.
[9] X. Liu and H. Ferhatosmano, "Efficient K-NN Search on Streaming Data Series," Lecture Notes on Computer Science, pp. 83-101, Springer, 2003.
[10] K. Yi, F. Li, G. Kolios, and D. Srivastava, "Efficient Processing of Top-$k$ Queries in Uncertain Databases with X-Relations," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 12, pp. 1669-1682, Dec. 2008.
[11] Y. Lifshits and S. Zhang, "Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Small World Design," Proc. Symp. Discrete Algorithms (SODA), 2009.
[12] L. Chen and X. Lian, "Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 3, pp. 321-336, Mar. 2008.
[13] S. Srivastava, M.R. Gupta, and B.A. Frigyik, "Bayesian Quadratic Discriminant Analysis," J. Machine Learning Research, vol. 8, pp. 1287-1314, 2007.
[14] B.A. Frigyik, S. Srivastava, and M.R. Gupta, "Functional Bregman Divergence and Bayesian Estimation of Distributions," IEEE Trans. Information Theory, vol. 54, no. 3, pp. 5130-5139, Nov. 2008.
[15] P. Vincent and Y. Bengio, "K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms," Advances in Neural Information Processing Systems, pp. 985-992, MIT Press, 2001.
[16] T. Hastie and R. Tibshirani, "Discriminative Adaptive Nearest Neighbour Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607-615, June 1996.
[17] J.S. Sánchez, F. Pla, and F.J. Ferri, "On the Use of Neighbourhood-Based Non-Parametric Classifiers," Pattern Recognition Letters, vol. 18, pp. 1179-1186, 1997.
[18] R. Sibson, "A Brief Description of Natural Neighbour Interpolation," Interpreting Multivariate Data, pp. 21-36, John Wiley, 1981.
[19] M.R. Gupta, E.K. Garcia, and E.M. Chin, "Adaptive Local Linear Regression with Application to Printer Color Management," IEEE Trans. Image Processing, vol. 17, no. 6, pp. 936-945, June 2008.
[20] A.K. Ghosh, P. Chaudhuri, and C.A. Murthy, "On Visualization and Aggregation of Nearest Neighbor Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1592-1602, Oct. 2005.
[21] M. Paik and Y. Yang, "Combining Nearest Neighbor Classifiers versus Cross-Validation Selection," Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, pp. 1-21, 2004.
[22] C.C. Holmes and N.M. Adams, "A Probabilistic Nearest Neighbour Method for Statistical Pattern Recognition," J. Royal Statistical Soc. B, vol. 64, pp. 295-306, 2002.
[23] S. Bay, "Combining Nearest Neighbor Classifiers through Multiple Feature Subsets," Proc. Int'l Conf. Machine Learning (ICML), pp. 37-45, 1998.
[24] P. Hall and R.J. Samworth, "Properties of Bagged Nearest Neighbour Classifiers," J. Royal Statistical Soc. B, vol. 67, pp. 363-379, 2005.
[25] T.G. Speed, Statistical Analysis of Gene Expression Microarray Data. CRC Press, 2003.
[26] D.B. Skalak, "Prototype Selection for Composite Nearest Neighbor Classification," PhD thesis, Univ. of Massachusetts, 1996.
[27] E. Alpaydin, "Voting over Multiple Condensed Nearest Neighbors," Artificial Intelligence Rev., vol. 11, pp. 115-132, 1997.
[28] D. Masip and J. Vitrià, "Boosted Discriminant Projections for Nearest Neighbor Classifiers," Pattern Recognition, vol. 39, pp. 164-170, 2006.
[29] E.L. Lehmann and G. Casella, Theory of Point Estimation. Springer, 1998.
[30] A. Banerjee, X. Guo, and H. Wang, "On the Optimality of Conditional Expectation as a Bregman Predictor," IEEE Trans. Information Theory, vol. 51, no. 7, pp. 2664-2669, July 2005.
[31] M.R. Gupta and W.H. Mortensen, "Weighted Nearest Neighbor Classifiers and First-Order Error," Proc. Int'l Conf. Frontiers of Interface between Statistics and Science, pp. 1-21, 2009.
[32] M.R. Gupta, S. Srivastava, and L. Cazzanti, "Minimum Expected Risk Estimation for Near-Neighbor Classification," UWEE technical report series, 2006.
[33] C.J. Stone, "Consistent Nonparametric Regression," The Annals of Statistics, vol. 5, no. 4, pp. 595-645, 1977.
[34] A.E. Hoerl and R. Kennard, "Ridge Regression: Biased Estimation for Nonorthogonal Problems," Technometrics, vol. 12, pp. 55-67, 1970.
[35] Y. Mitani and Y. Hamamoto, "Classifier Design Based on the Use of Nearest Neighbor Samples," Proc. Int'l Conf. Pattern Recognition, pp. 769-772, 2000.
[36] Y. Mitani and Y. Hamamoto, "A Local Mean-Based Nonparametric Classifier," Pattern Recognition Letters, vol. 27, pp. 1151-1159, 2006.
[37] J.H. Friedman, "Regularized Discriminant Analysis," J. Am. Statistical Assoc., vol. 84, no. 405, pp. 165-175, 1989.
[38] S. Geisser, "Posterior Odds for Multivariate Normal Distributions," J. Royal Soc. Series B Methodological, vol. 26, pp. 69-76, 1964.
[39] D.G. Keehn, "A Note on Learning for Gaussian Properties," IEEE Trans. Information Theory, vol. 11, no. 1, pp. 126-132, Jan. 1965.
[40] B. Ripley, Pattern Recognition and Neural Nets. Cambridge Univ. Press, 2001.
[41] H. Bensmail and G. Celeux, "Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition," J. Am. Statistical Assoc., vol. 91, pp. 1743-1748, 1996.
[42] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines,", 2001.
[43] K.B. Petersen and M.S. Pedersen, "The Matrix Cookbook," technical report, Technical Univ. of Denmark, 2005.
13 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool