This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Nonparametric Supervised Learning by Linear Interpolation with Maximum Entropy
May 2006 (vol. 28 no. 5)
pp. 766-781
Nonparametric neighborhood methods for learning entail estimation of class conditional probabilities based on relative frequencies of samples that are "near-neighbors” of a test point. We propose and explore the behavior of a learning algorithm that uses linear interpolation and the principle of maximum entropy (LIME). We consider some theoretical properties of the LIME algorithm: LIME weights have exponential form; the estimates are consistent; and the estimates are robust to additive noise. In relation to bias reduction, we show that near-neighbors contain a test point in their convex hull asymptotically. The common linear interpolation solution used for regression on grids or look-up-tables is shown to solve a related maximum entropy problem. LIME simulation results support use of the method, and performance on a pipeline integrity classification problem demonstrates that the proposed algorithm has practical value.

[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York: Springer-Verlag, 2001.
[2] D. Loftsgaarden and C. Quesenberry, “A Nonparametric Estimate of a Multivariate Density Function,” Annals Math. Statistics, vol. 36, pp. 1049-1051, 1965.
[3] E. Fix and J.L. Hodges, “Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties,” Technical Report 4, US Air Force School of Aviation Medicine, Tex., 1951.
[4] Y.P. Mack and M. Rosenblatt, “Multivariate k-Nearest Neighbor Density Estimates,” J. Multivariate Analysis, vol. 9, pp. 1-15, 1979.
[5] D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization. New York: Wiley, 1992.
[6] C.J. Stone, “Consistent Nonparametric Regression,” The Annals of Statistics, vol. 5, no. 4, pp. 595-645, 1977.
[7] M.P. Friedlander and M.R. Gupta, “On Minimizing Distortion and Relative Entropy,” IEEE Trans. Information Theory, vol. 52, no. 1, pp. 238-245, 2005.
[8] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.
[9] www.stanford.edu/dept/msande/facultysaunders , 2002.
[10] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, pp. 21-27, 1967.
[11] T.M. Cover, “Estimation by the Nearest-Neighbor Rule,” IEEE Trans. Information Theory, vol. 14, no. 1, pp. 50-55, 1968.
[12] R.M. Gray, Entropy and Information Theory. New York: Springer-Verlag, 1990.
[13] E.T. Jaynes, “On the Rationale of Maximum Entropy Methods,” Proc. IEEE, vol. 70, no. 9, pp. 939-952, 1982.
[14] T. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 1991.
[15] N. Wu, The Maximum Entropy Method. Berlin: Springer-Verlag, 1997.
[16] T.C. Hesterberg, “The Bootstrap and Empirical Likelihood,” Proc. Section Statistical Computing, Am. Statistical Assoc., pp. 34-36, 1997.
[17] T. Kohonen, G. Barna, and R. Chrisley, “Statistical Pattern Recognition with Neural Networks: Benchmarking Studies,” IEEE Int'l Conf. Neural Networks, vol. 1, pp. 61-68, 1988.
[18] R.M. Gray and R.A. Olshen, “Vector Quantization and Density Esimation,” Proc. Compression and Complexity of Sequences Conf., pp. 172-193, 1997.
[19] L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag, 1996.
[20] J. Rice, “Boundary Modification for Kernel Regression,” Comm. Statistics, Theory, and Methods, vol. 13, pp. 893-900, 1984.
[21] T. Hastie and C. Loader, “Local Regression: Automatic Kernel Carpentry,” Statistical Science, vol. 8, no. 2, pp. 120-143, 1993.
[22] B. Ripley, Pattern Recognition and Neural Nets. Cambridge: Cambridge Univ. Press, 2001.
[23] J.H. Friedman, “On Bias, Variance, 0/1 Loss, and the Curse-of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 55-77, 1997.
[24] D.B. O'Brien, M.R. Gupta, and R.M. Gray, “Analysis and Classification of Internal Pipeline Images,” Proc. IEEE Int'l Conf. Image Processing, 2003.
[25] C.S. Peirce, The Philosophy of Peirce: Selected Writings. Jarrold and Sons Limited, 1956.
[26] W. Kneale, Probability and Induction. Oxford: Clarendon Press, 1949.
[27] H. Kang, Color Technology for Electronic Imaging Devices. SPIE Press, 1997.
[28] “matlab version 6. 1 by Mathworks,” www.matlab.com, 2002.
[29] W.H. Press, W.T. Vetterling, S.A. Teukolsky, and B.P. Flannery, Numerical Recipes in C, second ed. Cambridge Univ. Press, 1999.
[30] G. Lugosi and K. Zeger, “Concept Learning Using Complexity Regularization,” IEEE Trans. Information Theory, vol. 42, pp. 48-54, 1996.
[31] A.R. Barron and T. Cover, “Minimum Complexity Density Estimation,” IEEE Trans. Information Theory, vol. 37, pp. 1034-1054, 1991.
[32] A. Najmi, “Data Compression, Model Selection and Statistical Inference,” PhD dissertation, Stanford Univ., Stanford, Calif., 1999.
[33] C.J. Stone, “Optimal Global Rates of Convergence for Nonparametric Regression,” Annals of Statistics, vol. 10, pp. 1040-1053, 1982.
[34] P.J. Bickel and L. Breiman, “Sums of Functions of Nearest Neighbor Distances, Moment Bounds, Limit Theorems and a Goodness of Fit Test,” Annals of Probability, vol. 11, no. 1, pp. 185-214, 1983.
[35] L. Breiman, J. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Chapman and Hall, 1984.
[36] L. Gordon and R.A. Olshen, “Almost Surely Consistent Nonparametric Regression from Recursive Partitioning Schemes,” J. Multivariate Analysis, vol. 15, pp. 146-163, 1984.
[37] P. Bickel and R.R. Bahadur, “Substitution in Conditional Expectation,” Annals of Math. Statistics, vol. 39, pp. 442-456, 1968.
[38] M. de Guzmán, Differentiation of Integrals in $R^n$ . Berlin: Springer Verlag, 1975.
[39] A. Garsia, Topics in Almost Everywhere Convergence. Chicago: Markham, 1970.
[40] D. Pollard, Convergence of Stochastic Processes. New York: Springer Verlag, 1984.

Index Terms:
Nonparametric statistics, probabilistic algorithms, pattern recognition, maximum entropy, linear interpolation.
Citation:
Maya R. Gupta, Robert M. Gray, Richard A. Olshen, "Nonparametric Supervised Learning by Linear Interpolation with Maximum Entropy," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 766-781, May 2006, doi:10.1109/TPAMI.2006.101
Usage of this product signifies your acceptance of the Terms of Use.