The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2010 vol.32)
pp: 1029-1043
Long (Leo) Zhu , Massachusetts Institute of Technology, Cambridge
Yuanhao Chen , University of Science and Technology of China, Hefei
Alan Yuille , University of California, Los Angeles, Los Angeles
ABSTRACT
In this paper, we address the tasks of detecting, segmenting, parsing, and matching deformable objects. We use a novel probabilistic object model that we call a hierarchical deformable template (HDT). The HDT represents the object by state variables defined over a hierarchy (with typically five levels). The hierarchy is built recursively by composing elementary structures to form more complex structures. A probability distribution—a parameterized exponential model—is defined over the hierarchy to quantify the variability in shape and appearance of the object at multiple scales. To perform inference—to estimate the most probable states of the hierarchy for an input image—we use a bottom-up algorithm called compositional inference. This algorithm is an approximate version of dynamic programming where approximations are made (e.g., pruning) to ensure that the algorithm is fast while maintaining high performance. We adapt the structure-perceptron algorithm to estimate the parameters of the HDT in a discriminative manner (simultaneously estimating the appearance and shape parameters). More precisely, we specify an exponential distribution for the HDT using a dictionary of potentials, which capture the appearance and shape cues. This dictionary can be large and so does not require handcrafting the potentials. Instead, structure-perceptron assigns weights to the potentials so that less important potentials receive small weights (this is like a “soft” form of feature selection). Finally, we provide experimental evaluation of HDTs on different visual tasks, including detection, segmentation, matching (alignment), and parsing. We show that HDTs achieve state-of-the-art performance for these different tasks when evaluated on data sets with groundtruth (and when compared to alternative algorithms, which are typically specialized to each task).
INDEX TERMS
Hierarchy, shape representation, object parsing, segmentation, shape matching, structured learning.
CITATION
Long (Leo) Zhu, Yuanhao Chen, Alan Yuille, "Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 6, pp. 1029-1043, June 2010, doi:10.1109/TPAMI.2009.65
REFERENCES
[1] J.M. Coughlan, A.L. Yuille, C. English, and D. Snow, "Efficient Deformable Template Detection and Localization without User Initialization," Computer Vision and Image Understanding, vol. 78, no. 3, pp. 303-319, 2000.
[2] J.M. Coughlan and S.J. Ferreira, "Finding Deformable Shapes Using Loopy Belief Propagation," Proc. European Conf. Computer Vision, vol. 3, pp. 453-468, 2002.
[3] H. Chui and A. Rangarajan, "A New Algorithm for Non-Rigid Point Matching," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2044-2051, 2000.
[4] S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[5] P.F. Felzenszwalb and D.P. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[6] P.F. Felzenszwalb and J.D. Schwartz, "Hierarchical Matching of Deformable Shapes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[7] P.A. Viola and M.J. Jones, "Fast and Robust Classification Using Asymmetric Adaboost and a Detector Cascade," Proc. Conf. Neural Information Processing Systems, pp. 1311-1318, 2001.
[8] P.A. Viola and M.J. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[9] A.L. Yuille, J. Coughlan, Y. Wu, and S. Zhu, "Order Parameters for Detecting Target Curves in Images: When Does High-Level Knowledge Help?" Int'l J. Computer Vision, vol. 41, nos. 1/2, pp. 9-33, 2001.
[10] Z. Tu, C. Narr, P. Dollar, I. Dinov, P. Thompson, and A. Toga, "Brain Anatomical Structure Segmentation by Hybrid Discriminative/Generative Models," IEEE Trans. Medical Imaging, vol. 27, no. 4, pp. 495-508, Apr. 2008.
[11] L. Zhu and A.L. Yuille, "A Hierarchical Compositional System for Rapid Object Detection," Proc. Conf. Neural Information Processing Systems, 2005.
[12] Y. Chen, L. Zhu, C. Lin, A.L. Yuille, and H. Zhang, "Rapid Inference on a Novel and/or Graph for Object Detection, Segmentation and Parsing," Proc. Conf. Neural Information Processing Systems, 2007.
[13] M. Collins, "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 1-8, 2002.
[14] C. Rother, V. Kolmogorov, and A. Blake, "'Grabcut': Interactive Foreground Extraction Using Iterated Graph Cuts," ACM Trans. Graphics, vol. 23, no. 3, pp. 309-314, 2004.
[15] L. Zhu, Y. Chen, X. Ye, and A.L. Yuille, "Structure-Perceptron Learning of a Hierarchical Log-Linear Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[16] E. Borenstein and S. Ullman, "Class-Specific, Top-Down Segmentation," Proc. European Conf. Computer Vision, vol. 2, pp. 109-124, 2002.
[17] B. Leibe, A. Leonardis, and B. Schiele, "Combined Object Categorization and Segmentation with an Implicit Shape Model," Proc. European Conf. Computer Vision Workshop Statistical Learning in Computer Vision, pp. 17-32, May 2004.
[18] S.Z. Li, H. Zhang, S. Yan, and Q. Cheng, "Multi-View Face Alignment Using Direct Appearance Models," Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 324-329, 2002.
[19] H. Li, S.-C. Yan, and L.-Z. Peng, "Robust Non-Frontal Face Alignment with Edge Based Texture," J. Computer Science and Technology, vol. 20, no. 6, pp. 849-854, 2005.
[20] T.F. Cootes, G.J. Edwards, and C.J. Taylor, "Active Appearance Models," Proc. European Conf. Computer Vision, vol. 2, pp. 484-498, 1998.
[21] Z. Tu and A.L. Yuille, "Shape Matching and Recognition—Using Generative Models and Informative Features," Proc. European Conf. Computer Vision, vol. 3, pp. 195-209, 2004.
[22] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid, "Groups of Adjacent Contour Segments for Object Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 36-51, Jan. 2008.
[23] J. Shotton, A. Blake, and R. Cipolla, "Multi-Scale Categorical Object Recognition Using Contour Fragments," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1270-1281, July 2008.
[24] M. Marszalek and C. Schmid, "Semantic Hierarchies for Visual Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-7, 2007.
[25] H. Chen, Z. Xu, Z. Liu, and S.C. Zhu, "Composite Templates for Cloth Modeling and Sketching," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 943-950, 2006.
[26] Y. Jin and S. Geman, "Context and Hierarchy in a Probabilistic Image Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 2145-2152, 2006.
[27] M.P. Kumar, P.H.S. Torr, and A. Zisserman, "Obj Cut," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 18-25, 2005.
[28] X. Ren, C. Fowlkes, and J. Malik, "Cue Integration for Figure/Ground Labeling," Proc. Conf. Neural Information Processing Systems, 2005.
[29] A. Levin and Y. Weiss, "Learning to Combine Bottom-Up and Top-Down Segmentation," Proc. European Conf. Computer Vision, vol. 4, pp. 581-594, 2006.
[30] J.M. Winn and N. Jojic, "Locus: Learning Object Classes with Unsupervised Segmentation," Proc. Int'l Conf. Computer Vision, pp. 756-763, 2005.
[31] T. Cour and J. Shi, "Recognizing Objects by Piecing Together the Segmentation Puzzle," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[32] L. Zhu, Y. Chen, and A.L. Yuille, "Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing," Proc. Conf. Neural Information Processing Systems, pp. 1617-1624, 2006.
[33] S.-F. Zheng, Z. Tu, and A. Yuille, "Detecting Object Boundaries Using Low-, Mid-, and High-Level Information," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[34] E. Sharon, A. Brandt, and R. Basri, "Fast Multiscale Image Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1070-1077, 2000.
[35] Y. Freund and R.E. Schapire, "Large Margin Classification Using the Perceptron Algorithm," Machine Learning, vol. 37, no. 3, pp. 277-296, 1999.
[36] M. Collins and N. Duffy, "New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron," Proc. Ann. Meeting Assoc. for Computational Linguistics, pp. 263-270, 2001.
[37] M. Collins and B. Roark, "Incremental Parsing with the Perceptron Algorithm," Proc. Ann. Meeting Assoc. for Computational Linguistics, p. 111, 2004.
[38] J. Shotton, J.M. Winn, C. Rother, and A. Criminisi, "TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation," Proc. European Conf. Computer Vision, vol. 1, pp. 1-15, 2006.
[39] L. Zhu, Y. Chen, Y. Lu, C. Lin, and A.L. Yuille, "Max Margin and/or Graph Learning for Parsing the Human Body," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[40] E. Borenstein and J. Malik, "Shape Guided Object Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 969-976, 2006.
[41] K. Fukushima, "Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition," Neural Networks, vol. 1, no. 2, pp. 119-130, 1988.
[42] Y. Amit, D. Geman, and X. Fan, "A Coarse-to-Fine Strategy for Multiclass Shape Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 12, pp. 1606-1621, Dec. 2004.
[43] T. Serre, L. Wolf, and T. Poggio, "Object Recognition with Features Inspired by Visual Cortex," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 994-1000, 2005.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool