The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2013 vol.35)
pp: 2878-2890
Yi Yang , Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
Deva Ramanan , Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
ABSTRACT
We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.
INDEX TERMS
Computational modeling, Human factors, Object segmentation, Human factors, Deformable models, Pose estimation, Shape analysis,deformable part models, Pose estimation, object detection, articulated shapes
CITATION
Yi Yang, Deva Ramanan, "Articulated Human Detection with Flexible Mixtures of Parts", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 12, pp. 2878-2890, Dec. 2013, doi:10.1109/TPAMI.2012.261
REFERENCES
[1] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[2] M. Fischler and R. Elschlager, "The Representation and Matching of Pictorial Structures," IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[3] L. Bourdev and J. Malik, "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[4] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[5] P. Felzenszwalb, R. Girshick, and D. McAllester, "Cascade Object Detection with Deformable Part Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[6] Y. Yang and D. Ramanan, "Flexible Mixtures of Parts for Articulated Pose Detection, Release 1.3," http://phoenix.ics. uci.edu/softwarepose /, 2013.
[7] V. Ferrari, M. Marin-Jimenez, and A. Zisserman, "Progressive Search Space Reduction for Human Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[8] V. Ferrari, M. Eichner, M. Marin-Jimenez, and A. Zisserman, "Buffy Stickmen v3.01: Annotated Data and Evaluation Routines for 2D Human Pose Estimation," http://www.robots.ox.ac.uk/vgg/datastickmen /, 2013.
[9] D. Ramanan, "Learning to Parse Images of Articulated Bodies," Proc. Advances in Neural Information Processing System, 2007.
[10] Y. Yang and D. Ramanan, "Articulated Pose Estimation with Flexible Mixtures-of-Parts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[11] J. O$^\prime$ Rourke and N. Badler, "Model-Based Image Analysis of Human Motion Using Constraint Propagation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 6, pp. 522-536, 1980.
[12] D. Hogg, "Model-Based Vision: A Program to See a Walking Person," Image and Vision Computing, vol. 1, no. 1, pp. 5-20, 1983.
[13] K. Rohr, "Towards Model-Based Recognition of Human Movements in Image Sequences," CVGIP-Image Understanding, vol. 59, no. 1, pp. 94-115, 1994.
[14] D. Ramanan, "Part-Based Models for Finding People and Estimating Their Pose," Visual Analysis of Humans, pp. 199-223, 2011.
[15] S. Ioffe and D. Forsyth, "Human Tracking with Mixtures of Trees," Proc. IEEE Int'l Conf. Computer Vision, 2001.
[16] M. Lee and I. Cohen, "Proposal Maps Driven MCMC for Estimating Human Body Pose in Static Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[17] S. Ioffe and D. Forsyth, "Probabilistic Methods for Finding People," Int'l J. Computer Vision, vol. 43, no. 1, pp. 45-68, 2001.
[18] L. Sigal and M. Black, "Measure Locally, Reason Globally: Occlusion-Sensitive Articulated Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[19] Y. Wang and G. Mori, "Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation," Proc. European Conf. Computer Vision, 2008.
[20] T. Tian and S. Sclaroff, "Fast Globally Optimal 2D Human Detection with Loopy Graph Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[21] V. Singh, R. Nevatia, and C. Huang, "Efficient Inference with Multiple Heterogenous Part Detectors for Human Pose Estimation," Proc. European Conf. Computer Vision, 2010.
[22] X. Lan and D. Huttenlocher, "Beyond Trees: Common-Factor Models for 2D Human Pose Recovery," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[23] D. Tran and D. Forsyth, "Improved Human Parsing with a Full Relational Model," Proc. European Conf. Computer Vision, 2010.
[24] D. Ramanan and C. Sminchisescu, "Training Deformable Models for Localization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[25] M. Kumar, A. Zisserman, and P. Torr, "Efficient Discriminative Learning of Parts-Based Models," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[26] B. Sapp, A. Toshev, and B. Taskar, "Cascaded Models for Articulated Pose Estimation," Proc. European Conf. Computer Vision, 2010.
[27] M. Andriluka, S. Roth, and B. Schiele, "Pictorial Structures Revisited: People Detection and Articulated Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[28] G. Mori, X. Ren, A. Efros, and J. Malik, "Recovering Human Body Configurations: Combining Segmentation and Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[29] B. Sapp, C. Jordan, and B. Taskar, "Adaptive Pose Priors for Pictorial Structures," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[30] P. Srinivasan and J. Shi, "Bottom-Up Recognition and Parsing of the Human Body," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[31] G. Mori and J. Malik, "Estimating Human Body Configurations Using Shape Context Matching," Proc. European Conf. Computer Vision, 2002.
[32] J. Sullivan and S. Carlsson, "Recognizing and Tracking Human Action," Proc. European Conf. Computer Vision, 2002.
[33] S. Johnson and M. Everingham, "Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision Workshops, 2009.
[34] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[35] H. Pirsiavash and D. Ramanan, "Steerable Part Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[36] W. Yang, Y. Wang, and G. Mori, "Recognizing Human Actions from Still Images with Latent Poses," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[37] M. Sun and S. Savarese, "Articulated Part-Based Model for Joint Object Detection and Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision Workshops, 2011.
[38] Y. Wang, D. Tran, and D. Liao, and Z. Forsyth, "Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition," J. Machine Learning Research, vol. 13, pp. 3075-3102, 2012.
[39] B. Epshtein and S. Ullman, "Semantic Hierarchies for Recognizing Objects and Parts," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[40] L. Zhu, Y. Chen, C. Lin, and A. Yuille, "Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation," Int'l J. Computer Vision, vol. 93, no. 1, pp. 1-21, 2011.
[41] B. Sapp, D. Weiss, and B. Taskar, "Parsing Human Motion with Stretchable Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[42] D. Park and D. Ramanan, "N-Best Maximal Decoders for Part Models," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[43] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, "Support Vector Machine Learning for Interdependent and Structured Output Spaces," Proc. Int'l Conf. Machine Learning, 2004.
[44] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, "Liblinear: A Library for Large Linear Classification," J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[45] A. Bordes, L. Bottou, P. Gallinari, and J. Weston, "Solving Multiclass Support Vector Machines with Larank," Proc. Int'l Conf. Machine Learning, 2007.
[46] D. Ramanan, "Dual Coordinate Descent Solvers for Large Structured Prediction Problems," technical report, Univ. of California, Irvine, 2012.
[47] A. Yuille and A. Rangarajan, "The Concave-Convex Procedure," Neural Computation, vol. 15, no. 4, pp. 915-936, 2003.
[48] M. Eichner, M. Marin-Jimenez, A. Zisserman, and V. Ferrari, "2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images," Int'l J. Computer Vision, vol. 99, no. 2, pp. 190-214, 2012.
[49] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[50] S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation," Proc. British Machine Vision Conf., 2010.
[51] S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[52] Y. Yang and D. Ramanan, "Flexible Mixtures of Parts for Articulated Pose Detection, Release 1.2," http://phoenix.ics. uci.edu/softwarepose /, 2013.
[53] M. Eichner and V. Ferrari, "Better Appearance Models for Pictorial Structures," Proc. British Machine Vision Conf., 2009.
197 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool