This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Articulated Human Detection with Flexible Mixtures of Parts
Dec. 2013 (vol. 35 no. 12)
pp. 2878-2890
Yi Yang, Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
Deva Ramanan, Dept. of Comput. Sci., Univ. of California at Irvine, Irvine, CA, USA
We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.
Index Terms:
support vector machines,dynamic programming,image representation,object detection,pose estimation,solid modelling,articulated human detection,Buffy datasets,Parse datasets,structured SVM solver,dynamic programming,global geometry dependency,local mixtures,similar warps,articulation modeling,spatial relations,standard pictorial structure models,cooccurrence relations,flexible mixture model,nonoriented parts,warped templates,deformable part model representation,static images,human pose estimation,flexible part mixtures,Computational modeling,Human factors,Object segmentation,Human factors,Deformable models,Pose estimation,Shape analysis,deformable part models,Pose estimation,object detection,articulated shapes
Citation:
Yi Yang, Deva Ramanan, "Articulated Human Detection with Flexible Mixtures of Parts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2878-2890, Dec. 2013, doi:10.1109/TPAMI.2012.261
Usage of this product signifies your acceptance of the Terms of Use.