The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2013 vol.35)
pp: 2189-2205
Zhangzhang Si , Dept. of Stat., Univ. of California, Los Angeles, Los Angeles, CA, USA
Song-Chun Zhu , Dept. of Stat., Univ. of California, Los Angeles, Los Angeles, CA, USA
ABSTRACT
This paper presents a framework for unsupervised learning of a hierarchical reconfigurable image template - the AND-OR Template (AOT) for visual objects. The AOT includes: 1) hierarchical composition as "AND" nodes, 2) deformation and articulation of parts as geometric "OR" nodes, and 3) multiple ways of composition as structural "OR" nodes. The terminal nodes are hybrid image templates (HIT) [17] that are fully generative to the pixels. We show that both the structures and parameters of the AOT model can be learned in an unsupervised way from images using an information projection principle. The learning algorithm consists of two steps: 1) a recursive block pursuit procedure to learn the hierarchical dictionary of primitives, parts, and objects, and 2) a graph compression procedure to minimize model structure for better generalizability. We investigate the factors that influence how well the learning algorithm can identify the underlying AOT. And we propose a number of ways to evaluate the performance of the learned AOTs through both synthesized examples and real-world images. Our model advances the state of the art for object detection by improving the accuracy of template matching.
INDEX TERMS
Training, Histograms, Image color analysis, Unsupervised learning, Visualization, Animals, Face,information projection, Deformable templates, object recognition, image grammar
CITATION
Zhangzhang Si, Song-Chun Zhu, "Learning AND-OR Templates for Object Recognition and Detection", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 9, pp. 2189-2205, Sept. 2013, doi:10.1109/TPAMI.2013.35
REFERENCES
[1] E. Borenstein and S. Ullman, "Class-Specific, Top-Down Segmentation," Proc. Seventh European Conf. Computer Vision, pp. 109-124, 2002.
[2] L.-B. Chang, Y. Jin, W. Zhang, E. Borenstein, and S. Geman, "Context, Computation, and Optimal ROC Performance in Hierarchical Models," Int'l J. Computer Vision, vol. 93, no. 2, pp. 117-140, 2011.
[3] T.F. Cootes, G.J. Edwards, and C.J. Taylor, "Active Appearance Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, June 2001.
[4] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[5] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[6] P.F. Felzenszwalb and D.P. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[7] R. Fergus, P. Perona, and A. Zisserman, "Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition," Int'l J. Computer Vision, vol. 71, no. 3, pp. 273-303, 2007.
[8] S. Fidler, M. Boben, and A. Leonardis, "A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-Class Object Detection," Proc. 11th European Conf. Computer Vision, 2010.
[9] S. Fidler and A. Leonardis, "Towards Scalable Representations of Object Categories: Learning a Hierarchy Of Parts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[10] M. Guillaumin, J. Verbeek, and C. Schmid, "Is That You? Metric Learning Approaches for Face Identification," Proc. 12th IEEE Int'l Conf. Computer Vision, 2009.
[11] G.E. Hinton, S. Osindero, and Y. Teh, "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, vol. 18, no. 7, pp. 1527-1554, July 2006.
[12] Y. Jin and S. Geman, "Context and Hierarchy in a Probabilistic Image Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[13] S.D. Pietra, V.D. Pietra, and J. Lafferty, "Inducing Features of Random Fields," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 380-393, Apr. 1997.
[14] P. Schnitzspan, M. Fritz, S. Roth, and B. Schiele, "Discriminative Structure Learning of Hierarchical Representations for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[15] G. Schwarz, "Estimating the Dimension of a Model," Ann. Statistics, vol. 6, no. 2, pp. 464-464, 1978.
[16] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, "Object Recognition with Cortex-Like Mechanisms," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 411-426, Mar. 2007.
[17] Z. Si and S.-C. Zhu, "Learning Hybrid Image Templates (HIT) by Information Projection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1354-1367, July 2012.
[18] E.B. Sudderth, A.B. Torralba, W.T. Freeman, and A.S. Willsky, "Describing Visual Scenes Using Transformed Objects and Parts," Int'l J. Computer Vision, vol. 77, no. 1-3, pp. 291-330, 2008.
[19] S. Todorovic and N. Ahuja, "Unsupervised Category Modeling, Recognition, and Segmentation in Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2158-2174, Dec. 2008.
[20] Y.N. Wu, Z. Si, H. Gong, and S.-C. Zhu, "Learning Active Basis Model for Object Detection and Recognition," Int'l J. Computer Vision, vol. 90, no. 2, pp. 198-230, 2010.
[21] Y. Yang and D. Ramanan, "Articulated Pose Estimation Using Flexible Mixtures of Parts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[22] L. Zhu, Y. Chen, A. Torralba, A. Yuille, and W.T. Freeman, "Latent Hierarchical Structural Learning for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[23] L. Zhu, Y. Chen, and A. Yuille, "Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 114-128, Jan. 2009.
[24] S.C. Zhu and D. Mumford, "A Stochastic Grammar of Images," Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 4, pp. 259-362, 2006.
[25] S.-C. Zhu, Y.N. Wu, and D.B. Mumford, "Minimax Entropy Principle and Its Applications to Texture Modeling," Neural Computation, vol. 9, no. 8, pp. 1627-1660, 1997.
27 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool