CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2009 vol.31 Issue No.07 - July

Subscribe

Issue No.07 - July (2009 vol.31)

pp: 1294-1309

Svetlana Lazebnik , University of North Carolina at Chapel Hill, Chapel Hill

Maxim Raginsky , Duke University, Durham

ABSTRACT

This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss such that the quantizer index of a given feature vector approximates a sufficient statistic for its class label. Informally, the quantized representation retains as much information as possible for classifying the feature vector correctly. We derive an alternating minimization procedure for simultaneously learning codebooks in the euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is validated on synthetic and real data sets and is applied to two diverse problems: learning discriminative visual vocabularies for bag-of-features image classification and image segmentation.

INDEX TERMS

Pattern recognition, information theory, quantization, clustering, computer vision, scene analysis, segmentation.

CITATION

Svetlana Lazebnik, Maxim Raginsky, "Supervised Learning of Quantizer Codebooks by Information Loss Minimization",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.31, no. 7, pp. 1294-1309, July 2009, doi:10.1109/TPAMI.2008.138REFERENCES

- [2] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh, “Clustering with Bregman Divergences,”
J. Machine Learning Research, vol. 6, pp. 1705-1749, 2005.- [3] T. Berger,
Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice-Hall, 1971.- [4] C. Bishop,
Neural Networks for Pattern Recognition. Clarendon Press, 1995.- [5] D. Blackwell and M.A. Girshick,
Theory of Games and Statistical Decisions. Wiley, 1954.- [7] T.M. Cover and J.A. Thomas,
Elements of Information Theory, second ed. Wiley, 2006.- [8] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, “Visual Categorization with Bags of Keypoints,”
Proc. ECCV Workshop Statistical Learning in Computer Vision, 2004.- [9] L. Devroye, L. Györfi, and G. Lugosi,
A Probabilistic Theory of Pattern Recognition. Springer, 1996.- [10] I. Dhillon, S. Mallela, and R. Kumar, “A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification,”
J.Machine Learning Research, vol. 3, pp. 1265-1287, 2003.- [11] A. Gersho and R.M. Gray,
Vector Quantization and Signal Compression. Kluwer, 1992.- [12] P.D. Grünwald,
The Minimum Description Length Principle. MIT Press, 2007.- [13] L. Fei-Fei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.- [14] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.- [17] T. Kohonen, “Learning Vector Quantization for Pattern Recognition,” Technical Report TKK-F-A601, Helsinki Inst. of Tech nology, 1986.
- [19] T. Kohonen,
Self-Organizing Maps, third ed. Springer-Verlag, 2000.- [20] S. Kullback,
Information Theory and Statistics. Dover, 1968.- [21] D. Larlus and F. Jurie, “Latent Mixture Vocabularies for Object Characterization,”
Proc. British Machine Vision Conf., 2005.- [22] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.- [23] S. Lazebnik and M. Raginsky, “Learning Nearest-Neighbor Quantizers from Labeled Data by Information Loss Minimization,”
Proc. 11th Int'l Conf. Artificial Intelligence and Statistics, 2007.- [24] T. Linder, “Learning-Theoretic Methods in Vector Quantization,”
Principles of Nonparametric Learning, L. Györfi, ed., Springer-Verlag, 2001.- [26] A. McCallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,”
Proc. AAAI-98 Workshop Learning for Text Categorization, pp. 41-48, 1998.- [27] F. Moosmann, B. Triggs, and F. Jurie, “Randomized Clustering Forests for Building Fast and Discriminative Visual Vocabularies,”
Proc. Neural Information Processing Systems, 2006.- [28] F. Odone, A. Barla, and A. Verri, “Building Kernels from Binary Strings for Image Matching,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 169-180, Feb. 1992.- [30] A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,”
Int'l J. Computer Vision, vol. 42, no. 3, pp. 145-175, 2001.- [32] A. Rao, D. Miller, K. Rose, and A. Gersho, “A Generalized VQ Method for Combined Compression and Estimation,”
Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 2032-2035, 1996.- [34] J. Rissanen, “Modelling by the Shortest Data Description,”
Automatica, vol. 14, pp. 465-471, 1978.- [35] C.P. Robert,
The Bayesian Choice, second ed. Springer-Verlag, 2001.- [37] G. Salton and M. McGill,
Introduction to Modern Information Retrieval. McGraw-Hill, 1986.- [40] N. Slonim and N. Tishby, “The Power of Word Clusters for Text Classification,”
Proc. 23rd European Colloquium on Information Retrieval Research, 2001.- [41] M. Swain and D. Ballard, “Color Indexing,”
Int'l J. Computer Vision, vol. 7, no. 1, pp. 11-32, 1991.- [42] N. Tishby, F.C. Pereira, and W. Bialek, “The Information Bottleneck Method,”
Proc. 37th Ann. Allerton Conf. Comm., Control and Computing, pp. 368-377, 1999. |