The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2009 vol.31)
pp: 27-38
Mun Wai Lee , ObjectVideo Inc., Reston
Ramakant Nevatia , University of Southern California, Los Angeles
ABSTRACT
Tracking human body poses in monocular video has many important applications. The problem is challenging in realistic scenes due to background clutter, variation in human appearance and self-occlusion. The complexity of pose tracking is further increased when there are multiple people whose bodies may inter-occlude. We proposed a three-stage approach with multi-level state representation that enables a hierarchical estimation of 3D body poses. Our method addresses various issues including automatic initialization, data association, self and inter-occlusion. At the first stage, humans are tracked as foreground blobs and their positions and sizes are coarsely estimated. In the second stage, parts such as face, shoulders and limbs are detected using various cues and the results are combined by a grid-based belief propagation algorithm to infer 2D joint positions. The derived belief maps are used as proposal functions in the third stage to infer the 3D pose using data-driven Markov chain Monte Carlo. Experimental results on several realistic indoor video sequences show that the method is able to track multiple persons during complex movement including sitting and turning movements with self and inter-occlusion.
INDEX TERMS
Computer vision, Image Processing and Computer Vision
CITATION
Mun Wai Lee, Ramakant Nevatia, "Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 1, pp. 27-38, January 2009, doi:10.1109/TPAMI.2008.35
REFERENCES
[1] A. Agarwal and B. Triggs, “3D Human Pose from Silhouettes by Relevance Vector Regression,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[2] C. Barron and I.A. Kakadiaris, “Estimating Anthropometry and Pose from a Single Image,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[3] S. Belongie, J. Malik, and J. Puzicha, “Shape Matching and Object Recognition Using Shape Contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[4] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 8-15, 1998.
[5] G.K.M. Cheung, S. Baker, and T. Kanade, “Shape-from-Silhouette of Articulated Objects and Its Use for Human Body Kinematics Estimation and Motion Capture,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 77-84, 2003.
[6] K. Choo and D.J. Fleet, “People Tracking with Hybrid Monte Carlo,” Proc. Int'l Conf. Computer Vision, 2001.
[7] D. Comaniciu, V. Ramesh, and P. Meer, “Real-Time Tracking of Non-Rigid Objects Using Mean Shift,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[8] D. Demirdjian, “Combining Geometric- and View-Based Approaches for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2004.
[9] J. Deutscher, A. Davison, and I. Reid, “Automatic Partitioning of High Dimensional Search Spaces Associated with Articulated Body Motion Capture,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[10] P. Felzenszwalb and D. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[11] W. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman and Hall, 1996.
[12] K. Grauman, G. Shakhnarovich, and T. Darrell, “Inferring 3D Structure with a Statistical Image-Based Shape Model,” Proc. Int'l Conf. Computer Vision, 2003.
[13] I. Haritaoglu, D. Harwood, and L. Davis, “Ghost: A Human Body Part Labeling System Using Silhouettes,” Proc. Int'l Conf. Pattern Recognition, 1998.
[14] G. Hua, M. Yang, and Y. Wu, “Learning to Estimate Human Pose with Data Driven Belief Propagation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[15] S. Ioffe and D.A. Forsyth, “Probabilistic Methods for Finding People,” Int'l J. Computer Vision, vol. 43, no. 1, pp. 45-68, June 2001.
[16] M. Isard, “PAMPAS: Real-Valued Graphical Models for Computer Vision,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[17] X. Lan and D.P. Huttenlocher, “Beyond Trees: Common-Factor Models for 2D Human Pose Recovery,” Proc. Int'l Conf. Computer Vision, 2005.
[18] M. Lee and I. Cohen, “Proposal Maps Driven MCMC for Estimating Human Body Pose in Static Images,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[19] M. Lee and R. Nevatia, “Dynamic Human Pose Estimation Using Markov Chain Monte Carlo Approach,” Motion, 2005.
[20] T.B. Moeslund and E. Granum, “A Survey of Computer Vision-Based Human Motion Capture,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 231-268, 2001.
[21] A. Mohan, C. Parageogiou, and T. Poggio, “Example-Based Object Detection in Image by Components,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 4, Apr. 2001.
[22] G. Mori and J. Malik, “Estimating Human Body Configurations Using Shape Context Matching,” Proc. European Conf. Computer Vision, pp. 666-680, 2002.
[23] G. Mori, X. Ren, A. Efros, and J. Malik, “Recovering Human Body Configurations: Combining Segmentation and Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[24] K.P. Murphy, Y. Weiss, and M.I. Jordan, “Loopy Belief Propagation for Approximate Inference: An Empirical Study,” Uncertainty in Artificial Intelligence, vol. 15, pp. 467-475, 1999.
[25] R. Neal, M. Beal, and S. Roweis, “Inferring State Sequences for Non-Linear Systems with Embedded Hidden Markov Models,” Proc. Neural Information Processing Systems, 2003.
[26] C. Papageorgiou, T. Evgeniou, and T. Poggio, “A Trainable Pedestrian Detection System,” Proc. IEEE Intelligent Vehicles Symp., pp. 241-246, 1998.
[27] D. Ramanan and D.A. Forsyth, “Finding and Tracking People from the Bottom Up,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[28] D. Ramanan, D.A. Forsyth, and A. Zisserman, “Strike a Pose: Tracking People by Finding Stylized Poses,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[29] R. Ronfard, C. Schmid, and B. Triggs, “Learning to Parse Pictures of People,” Proc. European Conf. Computer Vision, 2002.
[30] T.J. Roberts, S.J. McKenna, and I.W. Ricketts, “Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations,” Proc. European Conf. Computer Vision, 2004.
[31] R. Rosales and S. Sclaroff, “Inferring Body Pose without Tracking Body Parts,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[32] G. Shakhnarovich, P. Viola, and T. Darrell, “Face Pose Estimation with Parameter Sensitive Hashing,” Proc. Int'l Conf. Computer Vision, 2003.
[33] L. Sigal, S. Bhatia, S. Roth, M.J. Black, and M. Isard, “Tracking Loose-Limbed People,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[34] C. Sminchisescu and B. Triggs, “Kinematic Jump Processes for Monocular Human Tracking,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[35] C. Sminchisescu and A. Jepson, “Variational Mixture Smoothing for Non-Linear Dynamical Systems,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[36] X. Song and R. Nevatia, “Combined Face-Body Tracking in Indoor Environment,” Proc. Int'l Conf. Pattern Recognition, vol. 4, pp. 159-162, 2004.
[37] C. Stauffer and W. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1999.
[38] E.B. Sudderth, A.T. Ihler, W.T. Freeman, and A.S. Willsky, “Nonparametric Belief Propagation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[39] E.B. Sudderth, M.I. Mandel, W.T. Freeman, and A.S. Willsky, “Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation,” Proc. Neural Information Processing Systems, 2004.
[40] C.J. Taylor, “Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image,” Computer Vision and Image Understanding, vol. 80, no. 3, pp. 349-363, Dec. 2000.
[41] Z.W. Tu and S.C. Zhu, “Image Segmentation by Data-Driven Markov Chain Monte Carlo,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 657-672, May 2002.
[42] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[43] P. Viola, M.J. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Proc. Int'l Conf. Computer Vision, 2003.
[44] Y. Wu, G. Hua, and T. Yu, “Tracking Articulated Body by Dynamic Markov Network,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[45] S. Zhu, R. Zhang, and Z. Tu, “Integrating Bottom-Up/Top-Down for Object Recognition by Data Driven Markov Chain Monte Carlo,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[46] J. Zhang, R. Collins, and Y. Liu, “Representation and Matching of Articulated Shapes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[47] J. Zhang, R. Collins, and Y. Liu, “Bayesian Body Localization Using Mixture of Nonlinear Shape Models,” Proc. Int'l Conf. Computer Vision, 2004.
[48] T. Zhao and R. Nevatia, “Tracking Multiple Humans in Crowded Environment,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool