The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2013 vol.35)
pp: 2821-2840
Jamie Shotton , Microsoft Res., Cambridge, UK
Ross Girshick , EERES-COENG Eng. Res., Univ. of California, Berkeley, Berkeley, CA, USA
Andrew Fitzgibbon , Microsoft Res., Cambridge, UK
Toby Sharp , Microsoft Res., Cambridge, UK
Mat Cook , Microsoft Res., Cambridge, UK
Mark Finocchio , Microsoft Corp., Redmond, WA, USA
Pushmeet Kohli , Microsoft Res., Cambridge, UK
Antonio Criminisi , Microsoft Res., Cambridge, UK
Alex Kipman , Microsoft Corp., Redmond, WA, USA
Andrew Blake , Microsoft Res., Cambridge, UK
ABSTRACT
We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.
INDEX TERMS
Pose estimation, Cameras, Human factors, Shape analysis, Feature extraction, Rendering (computer graphics),games, Computer vision, machine learning, pixel classification, depth cues, range data
CITATION
Jamie Shotton, Ross Girshick, Andrew Fitzgibbon, Toby Sharp, Mat Cook, Mark Finocchio, Richard Moore, Pushmeet Kohli, Antonio Criminisi, Alex Kipman, Andrew Blake, "Efficient Human Pose Estimation from Single Depth Images", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 12, pp. 2821-2840, Dec. 2013, doi:10.1109/TPAMI.2012.241
REFERENCES
[1] D. Grest, J. Woetzel, and R. Koch, "Nonlinear Body Pose Estimation from Depth Images," Proc. 27th DAGM Conf. Pattern Recognition, 2005.
[2] S. Knoop, S. Vacek, and R. Dillmann, "Sensor Fusion for 3D Human Body Tracking with an Articulated 3D Body Model," Proc. IEEE Int'l Conf. Robotics and Automation, 2006.
[3] Y. Zhu and K. Fujimura, "Constrained Optimization for Human Pose Estimation from Depth Sequences," Proc. Eighth Asian Conf. Computer Vision, 2007.
[4] M. Siddiqui and G. Medioni, "Human Pose Estimation from a Single View Point, Real-Time Range Sensor," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2010.
[5] C. Plagemann, V. Ganapathi, D. Koller, and S. Thrun, "Real-Time Identification and Localization of Body Parts from Depth Images," Proc. IEEE Int'l Conf. Robotics and Automation, 2010.
[6] V. Ganapathi, C. Plagemann, D. Koller, and S. Thrun, "Real Time Motion Capture Using a Single Time-of-Flight Camera," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[7] Microsoft Corp., Redmond, Wash., "Kinect."
[8] C. Bregler and J. Malik, "Tracking People with Twists and Exponential Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1998.
[9] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard, "Tracking Loose-Limbed People," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[10] R. Wang and J. Popović, "Real-Time Hand-Tracking with a Color Glove," Proc. ACM Siggraph, 2009.
[11] M. Brubaker, D. Fleet, and A. Hertzmann, "Physics-Based Person Tracking Using the Anthropomorphic Walker," Int'l J. Computer Vision, vol. 87, pp. 140-155, 2010.
[12] T. Sharp, "Implementing Decision Trees and Forests on a GPU," Proc. European Conf. Computer Vision, 2008.
[13] R. Urtasun and T. Darrell, "Sparse Probabilistic Regression for Activity-Independent Human Pose Inference," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[14] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, "Real-Time Human Pose Recognition in Parts from a Single Depth Image," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[15] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[16] J. Winn and J. Shotton, "The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[17] L. Bourdev and J. Malik, "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[18] R. Girshick, J. Shotton, P. Kohli, A. Criminisi, and A. Fitzgibbon, "Efficient Regression of General-Activity Human Poses from Depth Images," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[19] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.
[20] J. Gall and V. Lempitsky, "Class-Specific Hough Forests for Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[21] T. Moeslund, A. Hilton, and V. Krüger, "A Survey of Advances in Vision-Based Human Motion Capture and Analysis," Computer Vision and Image Understanding, vol. 104, pp. 90-126, 2006.
[22] R. Poppe, "Vision-Based Human Motion Analysis: An Overview," Computer Vision and Image Understanding, vol. 108, pp. 4-18, 2007.
[23] M. Fischler and R. Elschlager, "The Representation and Matching of Pictorial Structures," IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[24] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, Jan. 2005.
[25] S. Ioffe and D. Forsyth, "Probabilistic Methods for Finding People," Int'l J. Computer Vision, vol. 43, no. 1, pp. 45-68, 2001.
[26] D. Ramanan and D. Forsyth, "Finding and Tracking People from the Bottom Up," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[27] Z. Tu, "Auto-Context and Its Application to High-Level Vision Tasks," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[28] D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, and A. Ng, "Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[29] E. Kalogerakis, A. Hertzmann, and K. Singh, "Learning 3D Mesh Segmentation and Labeling," ACM Trans. Graphics, vol. 29, no. 3,article 102, 2010.
[30] A. Agarwal and B. Triggs, "3D Human Pose from Silhouettes by Relevance Vector Regression," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[31] A. Kanaujia, C. Sminchisescu, and D. Metaxas, "Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[32] R. Navaratnam, A.W. Fitzgibbon, and R. Cipolla, "The Joint Manifold Model for Semi-Supervised Multi-Valued Regression," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[33] G. Mori and J. Malik, "Estimating Human Body Configurations Using Shape Context Matching," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[34] G. Shakhnarovich, P. Viola, and T. Darrell, "Fast Pose Estimation with Parameter Sensitive Hashing," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[35] G. Rogez, J. Rihan, S. Ramalingam, C. Orrite, and P. Torr, "Randomized Trees for Human Pose Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[36] B. Leibe, A. Leonardis, and B. Schiele, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, nos. 1-3, pp. 259-289, 2008.
[37] J. Müller and M. Arens, "Human Pose Estimation with Implicit Shape Models," Proc. First ACM Int'l Workshop Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, 2010.
[38] R. Okada and S. Soatto, "Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images," Proc. 10th European Conf. Computer Vision, 2008.
[39] H. Ning, W. Xu, Y. Gong, and T.S. Huang, "Discriminative Learning of Visual Words for 3D Human Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[40] H. Sidenbladh, M. Black, and L. Sigal, "Implicit Probabilistic Models of Human Motion for Synthesis and Tracking," Proc. Seventh European Conf. Computer Vision, 2002.
[41] T. Gonzalez, "Clustering to Minimize the Maximum Intercluster Distance," Theoretical Computer Science, vol. 38, pp. 293-306, 1985.
[42] CMU Mocap Database, http:/mocap.cs.cmu.edu/, 2013.
[43] Autodesk MotionBuilder.
[44] V. Lepetit, P. Lagger, and P. Fua, "Randomized Trees for Real-Time Keypoint Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2:775-781, 2005.
[45] S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[46] B. Shepherd, "An Appraisal of a Decision Tree Approach to Image Classification," Proc. Eight Int'l Joint Conf. Artificial Intelligence, 1983.
[47] J.R. Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[48] Y. Amit and D. Geman, "Shape Quantization and Recognition with Randomized Trees," Neural Computation, vol. 9, no. 7, pp. 1545-1588, 1997.
[49] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[50] A. Criminisi, J. Shotton, and E. Konukoglu, "Decision Forests: A Unified Framework," Foundations and Trends in Computer Graphics and Vision, vol. 7, pp. 81-227, 2012.
[51] F. Moosmann, B. Triggs, and F. Jurie, "Fast Discriminative Visual Codebooks Using Randomized Clustering Forests," Proc. Advances in Neural Information Processing Systems, vol. 19, pp. 985-992, 2006.
[52] J. Shotton, M. Johnson, and R. Cipolla, "Semantic Texton Forests for Image Categorization and Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[53] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation," Proc. European Conf. Computer Vision, 2006.
[54] A. Criminisi, J. Shotton, D. Robertson, and E. Konukoglu, "Regression Forests for Efficient Anatomy Detection and Localization in CT Studies," Proc. Int'l MICCAI Conf. Medical Computer Vision, 2010.
[55] D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.
[56] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin, "The Elements of Statistical Learning: Data Mining, Inference and Prediction," The Math. Intelligencer, vol. 27, no. 2, pp. 83-85, 2005.
[57] S. Nowozin, "Improved Information Gain Estimates for Decision Tree Induction," Proc. Int'l Conf. Machine Learning, 2012.
[58] A. Montillo, J. Shotton, J. Winn, J. Iglesias, D. Metaxas, and A. Criminisi, "Entangled Decision Forests and Their Application for Semantic Segmentation of CT Images," Proc. 22nd Int'l Conf. Information Processing in Medical Imaging, 2011.
[59] J.S. Vitter, "Random Sampling with a Reservoir," ACM Trans. Math. Software, vol. 11, no. 1, pp. 37-57, 1985.
[60] D. Gavrila, "Entangled Decision Forests and Their Application for Semantic Segmentation of CT Images," Proc. European Conf. Computer Vision, June 2000.
93 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool