This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Recovering 3D Human Pose from Monocular Images
January 2006 (vol. 28 no. 1)
pp. 44-58
We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression, and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. The loss of depth and limb labeling information often makes the recovery of 3D pose from single silhouettes ambiguous. To handle this, the method is embedded in a novel regressive tracking framework, using dynamics from the previous state estimate together with a learned regression value to disambiguate the pose. We show that the resulting system tracks long sequences stably. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated for several representations of full body pose, both quantitatively on independent but similar test data and qualitatively on real image sequences. Mean angular errors of 4{\hbox{-}}6^\circ are obtained for a variety of walking motions.

[1] A. Agarwal and B. Triggs , “3D Human Pose from Silhouettes by Relevance Vector Regression,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[2] A. Agarwal and B. Triggs , “Learning to Track 3D Human Motion from Silhouettes,” Proc. Int'l Conf. Machine Learning, 2004.
[3] A. Agarwal and B. Triggs , “Tracking Articulated Motion Using a Mixture of Autoregressive Models,” Proc. European Conf. Computer Vision, 2004.
[4] V. Athitsos and S. Sclaroff , “Inferring Body Pose without Tracking Body Parts,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, 2000.
[5] V. Athitsos and S. Sclaroff , “Estimating 3D Hand Pose from a Cluttered Image,” Proc. Int'l Conf. Computer Vision, 2003.
[6] S. Belongie , J. Malik , and J. Puzicha , “Shape Matching and Object Recognition Using Shape Contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[7] C. Bishop , Neural Networks for Pattern Recognition, chapter 6. Oxford Univ. Press, 1995.
[8] M. Brand , “Shadow Puppetry,” Proc. Int'l Conf. Computer Vision, pp. 1237-1244, 1999.
[9] C. Bregler and J. Malik , “Tracking People with Twists and Exponential Maps,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, pp. 8-15, 1998.
[10] A. D'Souza , S. Vijayakumar , and S. Schaal , “Learning Inverse Kinematics,” Proc. Int'l Conf. Intelligent Robots and Systems, 2001.
[11] K. Grauman , G. Shakhnarovich , and T. Darrell , “Inferring 3D Structure with a Statistical Image-Based Shape Model,” Proc. Int'l Conf. Computer Vision, pp. 641-648, 2003.
[12] N. Howe , M. Leventon , and W. Freeman , “Bayesian Reconstruction of 3D Human Motion from Single-Camera Video,” Neural Information Processing Systems, 1999.
[13] M. Isard and A. Blake , “CONDENSATION— Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[14] T. Joachims , “Making Large-Scale SVM Learning Practical,” Advances in Kernel Methods— Support Vector Learning. MIT Press, 1999.
[15] F. Jurie and M. Dhome , “Hyperplane Approximation for Template Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 996-1000, July 2002.
[16] D. Lowe , “Object Recognition from Local Scale-Invariant Features,” Proc. Int'l Conf. Computer Vision, pp. 1150-1157, 1999.
[17] D.J.C. MacKay , “Comparison of Approximate Methods for Handling Hyperparameters,” Neural Computation, vol. 11, no. 5, pp. 1035-1068, 1999.
[18] G. Mori and J. Malik , “Estimating Human Body Configurations Using Shape Context Matching,” Proc. European Conf. Computer Vision, vol. 3, pp. 666-680, 2002.
[19] D. Ormoneit , H. Sidenbladh , M. Black , and T. Hastie , “Learning and Tracking Cyclic Human Motion,” Neural Information Processing Systems, pp. 894-900, 2000.
[20] V. Pavlovic , J. Rehg , and J. MacCormick , “Learning Switching Linear Models of Human Motion,” Neural Information Processing Systems, pp. 981-987, 2000.
[21] Y. Rubner , C. Tomasi , and L.J. Guibas , “A Metric for Distributions with Applications to Image Databases,” Proc. Int'l Conf. Computer Vision, 1998.
[22] G. Shakhnarovich , P. Viola , and T. Darrell , “Fast Pose Estimation with Parameter Sensitive Hashing,” Proc. Int'l Conf. Computer Vision, 2003.
[23] H. Sidenbladh , M. Black , and L. Sigal , “Implicit Probabilistic Models of Human Motion for Synthesis and Tracking,” Proc. European Conf. Computer Vision, vol. 1, 2002.
[24] C. Sminchisescu and B. Triggs , “Kinematic Jump Processes for Monocular 3D Human Tracking,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, June 2003.
[25] A. Smola and B. Schölkpof , “A Tutorial on Support Vector Regression,” Technical Report NC2-TR-1998-030, NeuroCOLT2, 1998.
[26] B. Stenger , A. Thayananthan , P. Torr , and R. Cipolla , “Filtering Using a Tree-Based Estimator,” Proc. Int'l Conf. Computer Vision, 2003.
[27] C. Taylor , “Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, 2000.
[28] M. Tipping , “The Relevance Vector Machine,” Neural Information Processing Systems, 2000.
[29] M. Tipping , “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Machine Learning Research, vol. 1, pp. 211-244, 2001.
[30] K. Toyama and A. Blake , “Probabilistic Tracking in a Metric Space,” Proc. Int'l Conf. Computer Vision, pp. 50-59, 2001.
[31] V. Vapnik , The Nature of Statistical Learning Theory. Springer, 1995.
[32] O. Williams , A. Blake , and R. Cipolla , “A Sparse Probabilistic Learning Algorithm for Real-Time Tracking,” Proc. Int'l Conf. Computer Vision, 2003.

Index Terms:
Index Terms- Computer vision, human motion estimation, machine learning, multivariate regression.
Citation:
Ankur Agarwal, Bill Triggs, "Recovering 3D Human Pose from Monocular Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 44-58, Jan. 2006, doi:10.1109/TPAMI.2006.21
Usage of this product signifies your acceptance of the Terms of Use.