This Article 
 Bibliographic References 
 Add to: 
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
July 2009 (vol. 31 no. 7)
pp. 1294-1309
Svetlana Lazebnik, University of North Carolina at Chapel Hill, Chapel Hill
Maxim Raginsky, Duke University, Durham
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss such that the quantizer index of a given feature vector approximates a sufficient statistic for its class label. Informally, the quantized representation retains as much information as possible for classifying the feature vector correctly. We derive an alternating minimization procedure for simultaneously learning codebooks in the euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is validated on synthetic and real data sets and is applied to two diverse problems: learning discriminative visual vocabularies for bag-of-features image classification and image segmentation.

[1] A. Aiyer, K. Pyun, Y. Huang, D.B. O'Brien, and R.M. Gray, “Lloyd Clustering of Gauss Mixture Models for Image Compression and Classification,” Signal Processing: Image Comm., vol. 20, pp. 459-485, 2005.
[2] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh, “Clustering with Bregman Divergences,” J. Machine Learning Research, vol. 6, pp. 1705-1749, 2005.
[3] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall, 1971.
[4] C. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[5] D. Blackwell and M.A. Girshick, Theory of Games and Statistical Decisions. Wiley, 1954.
[6] L.M. Bregman, “The Relaxation Method of Finding the Common Points of Convex Sets and Its Application to the Solution of Problems in Convex Programming,” USSR Computational Math. and Math. Physics, vol. 7, pp. 200-217, 1967.
[7] T.M. Cover and J.A. Thomas, Elements of Information Theory, second ed. Wiley, 2006.
[8] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual Categorization with Bags of Keypoints,” Proc. ECCV Workshop Statistical Learning in Computer Vision, 2004.
[9] L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer, 1996.
[10] I. Dhillon, S. Mallela, and R. Kumar, “A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification,” J.Machine Learning Research, vol. 3, pp. 1265-1287, 2003.
[11] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression. Kluwer, 1992.
[12] P.D. Grünwald, The Minimum Description Length Principle. MIT Press, 2007.
[13] L. Fei-Fei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.
[15] M. Heiler and C. Schnörr, “Natural Image Statistics for Natural Image Segmentation,” Int'l J. Computer Vision, vol. 63, no. 1, pp. 5-19, 2005.
[16] J. Huang and D. Mumford, “Statistics of Natural Images and Models,” Proc. Int'l Conf. Computer Vision, vol. 1, pp. 541-547, 1999.
[17] T. Kohonen, “Learning Vector Quantization for Pattern Recognition,” Technical Report TKK-F-A601, Helsinki Inst. of Tech nology, 1986.
[18] T. Kohonen, “Improved Versions of Learning Vector Quantization,” Proc. IEEE Int'l Joint Conf. Neural Networks, vol. 1, pp. 545-550, 1990.
[19] T. Kohonen, Self-Organizing Maps, third ed. Springer-Verlag, 2000.
[20] S. Kullback, Information Theory and Statistics. Dover, 1968.
[21] D. Larlus and F. Jurie, “Latent Mixture Vocabularies for Object Characterization,” Proc. British Machine Vision Conf., 2005.
[22] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[23] S. Lazebnik and M. Raginsky, “Learning Nearest-Neighbor Quantizers from Labeled Data by Information Loss Minimization,” Proc. 11th Int'l Conf. Artificial Intelligence and Statistics, 2007.
[24] T. Linder, “Learning-Theoretic Methods in Vector Quantization,” Principles of Nonparametric Learning, L. Györfi, ed., Springer-Verlag, 2001.
[25] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[26] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” Proc. AAAI-98 Workshop Learning for Text Categorization, pp. 41-48, 1998.
[27] F. Moosmann, B. Triggs, and F. Jurie, “Randomized Clustering Forests for Building Fast and Discriminative Visual Vocabularies,” Proc. Neural Information Processing Systems, 2006.
[28] F. Odone, A. Barla, and A. Verri, “Building Kernels from Binary Strings for Image Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 169-180, Feb. 1992.
[29] K.L. Oehler and R.M. Gray, “Combining Image Compression and Classification Using Vector Quantization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 5, pp. 461-473, May 1995.
[30] A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.
[31] E. Parzen, “On Estimation of a Probability Density Function and Mode,” Annals Math. Statistics, vol. 33, no. 3, pp. 1065-1076, 1962.
[32] A. Rao, D. Miller, K. Rose, and A. Gersho, “A Generalized VQ Method for Combined Compression and Estimation,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 2032-2035, 1996.
[33] X. Ren and J. Malik, “Learning a Classification Model for Segmentation,” Proc. Int'l Conf. Computer Vision, vol. 1, pp. 10-17, 2003.
[34] J. Rissanen, “Modelling by the Shortest Data Description,” Automatica, vol. 14, pp. 465-471, 1978.
[35] C.P. Robert, The Bayesian Choice, second ed. Springer-Verlag, 2001.
[36] K. Rose, “Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems,” Proc. IEEE, vol. 86, no. 11, pp. 2210-2239, 1998.
[37] G. Salton and M. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, 1986.
[38] N. Slonim, G.S. Atwal, G. Tkačik, and W. Bialek, “Information-Based Clustering,” Proc. Nat'l Academy of Sciences, vol. 102, pp.18297-18302, 2005.
[39] J. Sivic and A. Zisserman, “Video Google: A Text Retrieval Approach to Object Matching in Videos,” Proc. Int'l Conf. Computer Vision, vol. 2, pp. 1470-1477, 2003.
[40] N. Slonim and N. Tishby, “The Power of Word Clusters for Text Classification,” Proc. 23rd European Colloquium on Information Retrieval Research, 2001.
[41] M. Swain and D. Ballard, “Color Indexing,” Int'l J. Computer Vision, vol. 7, no. 1, pp. 11-32, 1991.
[42] N. Tishby, F.C. Pereira, and W. Bialek, “The Information Bottleneck Method,” Proc. 37th Ann. Allerton Conf. Comm., Control and Computing, pp. 368-377, 1999.
[43] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study,” Int'l J. Computer Vision, vol. 73, no. 2, pp. 213-238, 2007.

Index Terms:
Pattern recognition, information theory, quantization, clustering, computer vision, scene analysis, segmentation.
Svetlana Lazebnik, Maxim Raginsky, "Supervised Learning of Quantizer Codebooks by Information Loss Minimization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 7, pp. 1294-1309, July 2009, doi:10.1109/TPAMI.2008.138
Usage of this product signifies your acceptance of the Terms of Use.