The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2012 vol.34)
pp: 1354-1367
Zhangzhang Si , University of California, Los Angeles
Song-Chun Zhu , University of California, Los Angeles
This paper presents a novel framework for learning a generative image representation—the hybrid image template (HIT) from a small number (i.e., 3 \sim 20) of image examples. Each learned template is composed of, typically, 50 \sim 500 image patches whose geometric attributes (location, scale, orientation) may adapt in a local neighborhood for deformation, and whose appearances are characterized, respectively, by four types of descriptors: local sketch (edge or bar), texture gradients with orientations, flatness regions, and colors. These heterogeneous patches are automatically ranked and selected from a large pool according to their information gains using an information projection framework. Intuitively, a patch has a higher information gain if 1) its feature statistics are consistent within the training examples and are distinctive from the statistics of negative examples (i.e., generic images or examples from other categories); and 2) its feature statistics have less intraclass variations. The learning process pursues the most informative (for either generative or discriminative purpose) patches one at a time and stops when the information gain is within statistical fluctuation. The template is associated with a well-normalized probability model that integrates the heterogeneous feature statistics. This automated feature selection procedure allows our algorithm to scale up to a wide range of image categories, from those with regular shapes to those with stochastic texture. The learned representation captures the intrinsic characteristics of the object or scene categories. We evaluate the hybrid image templates on several public benchmarks, and demonstrate classification performances on par with state-of-the-art methods like HoG+SVM, and when small training sample sizes are used, the proposed system shows a clear advantage.
Image representation, deformable templates, information projection, visual learning, statistical modeling.
Zhangzhang Si, Song-Chun Zhu, "Learning Hybrid Image Templates (HIT) by Information Projection", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 7, pp. 1354-1367, July 2012, doi:10.1109/TPAMI.2011.227
[1] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[2] Y.N. Wu, Z. Si, H. Gong, and S.-C. Zhu, "Learning Active Basis Model for Object Detection and Recognition," Int'l J. Computer Vision, vol. 90, no. 2, pp. 198-235, 2010.
[3] J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679-698, Nov. 1986.
[4] B.A. Olshausen and D.J. Field, "Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images," Nature, vol. 381, pp. 607-609, June 1996.
[5] S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[6] C. Siagian and L. Itti, "Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 2, pp. 300-312, Feb. 2007.
[7] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l J. Computer Vision, vol. 60, pp. 91-110, 2004.
[8] Y.N. Wu, C.-E. Guo, and S.-C. Zhu, "From Information Scaling to Regimes of Statistical Models," Quarterly of Applied Math., vol. 66, pp. 81-122, 2008.
[9] A.L. Yuille, P. Hallinan, and D. Cohen, "Feature Extraction from Faces Using Deformable Templates," Int'l J. Computer Vision, vol. 8, pp. 99-111, 1992.
[10] T.F. Cootes, G.J. Edwards, and C.J. Taylor, "Active Appearance Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, June 2001.
[11] P.F. Felzenszwalb and D.P. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[12] R. Fergus, P. Perona, and A. Zisserman, "Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition," Int'l J. Computer Vision, vol. 71, no. 3, pp. 273-303, Mar. 2007.
[13] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[14] L. Zhu, Y. Chen, A. Torralba, W. Freeman, and A. Yuille, "Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[15] N. Ahuja and S. Todorovic, "Learning the Taxonomy and Models of Categories Present in Arbitrary Images," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[16] S. Fidler and A. Leonardis, "Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[17] C.-E. Guo, S.-C. Zhu, and Y.N. Wu, "Primal Sketch: Integrating Structure and Texture," Computer Vision and Image Understanding, vol. 106, pp. 5-19, 2007.
[18] S.D. Pietra, V.D. Pietra, and J. Lafferty, "Inducing Features of Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.
[19] S.-C. Zhu, Y.N. Wu, and D.B. Mumford, "Minimax Entropy Principle and Its Applications to Texture Modeling," Neural Computation, vol. 9, no. 8, pp. 1627-1660, 1997.
[20] G. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence," Neural Computation, vol. 14, pp. 1771-1800, 2000.
[21] J.H. Friedman, "Exploratory Projection Pursuit," J. Am. Statistics Assoc., vol. 82, no. 397, pp. 249-266, 1987.
[22] E.B. Sudderth, A.B. Torralba, W.T. Freeman, and A.S. Willsky, "Describing Visual Scenes Using Transformed Objects and Parts," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 291-330, 2008.
[23] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, pp. 119-139, 1997.
[24] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[25] F. Fleuret, "Fast Binary Feature Selection with Conditional Mutual Information," J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004.
[26] H. Zhang, A.C. Berg, M. Maire, and J. Malik, "SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[27] A. Bissacco, M.-H. Yang, and S. Soatto, "Detecting Humans via Their Pose," Advances in Neural Information Processing Systems, MIT Press, 2007.
[28] M. Varma and D. Ray, "Learning the Discriminative Power-Invariance Trade-Off," Proc. IEEE Int'l Conf. Computer Vision, Oct. 2007.
[29] A. Opelt, A. Pinz, and A. Zisserman, "Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection," Int'l J. Computer Vision, vol. 80, pp. 16-44, 2008.
[30] X. Ma and W.E.L. Grimson, "Learning Coupled Conditional Random Field for Image Decomposition with Application on Object Categorization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[31] P. Gehler and S. Nowozin, "On Feature Combination for Multiclass Object Classification," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[32] C. Liu, L. Sharan, E.H. Adelson, and R. Rosenholtz, "Exploring Features in a Bayesian Framework for Material Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[33] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, "Robust Object Recognition with Cortex-Like Mechanisms," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[34] H. Akaike, "A New Look at the Statistical Model Identification," IEEE Trans. Automatic Control, vol. 19, no. 6, pp. 716-723, Dec. 1974.
[35] B. Heisele, P. Ho, and T. Poggio, "Face Recognition with Support Vector Machines: Global versus Component-Based Approach," Proc. IEEE Int'l Conf. Computer Vision, 2001.
[36] R.A. Epstein, W.E. Parker, and A.M. Feiler, "Two Kinds of Fmri Repetition Suppression? Evidence for Dissociable Neural Mechanisms," J. Neurophysiology, vol. 99, pp. 2877-2886, 2008.
[37] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, "The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results," challenges/ VOC/voc2007/workshopindex.html , 2011.
[38] L. Fei-Fei, R. Fergus, and P. Perona, "Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories," Proc. CVPR Workshop Generative-Model Based Vision, 2004.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool