The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2009 vol.31)
pp: 520-538
Ahmed Elgammal , Rutgers University, Piscataway
Chan-Su Lee , Rutgers University, Piscataway
ABSTRACT
We present a framework for monocular 3D kinematic pose tracking and viewpoint estimation of periodic and quasi-periodic human motions from an uncalibrated camera. The approach we introduce here is based on learning both the visual observation manifold and the kinematic manifold of the motion using a joint representation. We show that the visual manifold of the observed shape of a human performing a periodic motion, observed from different viewpoints, is topologically equivalent to {\em a torus manifold}. The approach we introduce here is based on {\em supervised} learning of both the visual and kinematic manifolds. Instead of learning an embedding of the manifold, we learn the geometric deformation between an ideal manifold (conceptual equivalent topological structure) and a twisted version of the manifold (the data). Experimental results show accurate estimation of the 3D body posture and the viewpoint from a single uncalibrated camera.
INDEX TERMS
Motion, Shape, Video analysis
CITATION
Ahmed Elgammal, Chan-Su Lee, "Tracking People on a Torus", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 3, pp. 520-538, March 2009, doi:10.1109/TPAMI.2008.101
REFERENCES
[1] A. Elgammal and C.-S. Lee, “Inferring 3D Body Pose from Silhouettes Using Activity Manifold Learning,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 681-688, 2004.
[2] T.-P. Tian, R. Li, and S. Sclaroff, “Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions,” Proc. Workshop Learning in Computer Vision and Pattern Recognition, 2005.
[3] R. Urtasun, D.J. Fleet, A. Hertzmann, and P. Fua, “Priors for People Tracking from Small Training Sets,” Proc. Int'l Conf. Computer Vision, pp. 403-410, 2005.
[4] V.I. Morariu and O.I. Camps, “Modeling Correspondences for Multi-Camera Tracking Using Nonlinear Manifold Learning and Target Dynamics,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 545-552, 2006.
[5] H. Lim, O.I. Camps, M. Sznaier, and V.I. Morariu, “Dynamic Appearance Modeling for Human Tracking,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 751-757, 2006.
[6] K. Moon and V. Pavlovic, “Impact of Dynamics on Subspace Embedding and Tracking of Sequences,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 198-205, 2006.
[7] J.K. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,” Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440, 1999.
[8] D.M. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, 1999.
[9] T.B. Moeslund, A. Hilton, and V. Krüger, “A Survey of Advances in Vision-Based Human Motion Capture and Analysis,” Computer Vision and Image Understanding, vol. 104, no. 2, pp. 90-126, 2006.
[10] J. O'Rourke and Badler, “Model-Based Image Analysis of Human Motion Using Constraint Propagation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 6, 1980.
[11] D. Hogg, “Model-Based Vision: A Program to See a Walking Person,” Image and Vision Computing, vol. 1, no. 1, pp. 5-20, 1983.
[12] K. Rohr, “Towards Model-Based Recognition of Human Movements in Image Sequence,” Computer Vision, Graphics, and Image Processing, vol. 59, no. 1, pp. 94-115, 1994.
[13] J.M. Rehg and T. Kanade, “Model-Based Tracking of Self-Occluding Articulated Objects,” Proc. Int'l Conf. Computer Vision, pp. 612-617, 1995.
[14] D. Gavrila and L. Davis, “3-D Model-Based Tracking of Humans in Action: A Multi-View Approach,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 73-80, 1996.
[15] I.A. Kakadiaris and D. Metaxas, “Model-Based Estimation of 3D Human Motion with Occlusion Based on Active Multi-Viewpoint Selection,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 81-87, 1996.
[16] H. Sidenbladh, M.J. Black, and D.J. Fleet, “Stochastic Tracking of 3D Human Figures Using 2D Image Motion,” Proc. European Conf. Computer Vision, pp. 702-718, 2000.
[17] J.M. Rehg and T. Kanade, “Visual Tracking of High DOF Articulated Structures: An Application to Human Hand Tracking,” Proc. European Conf. Computer Vision, pp. 35-46, 1994.
[18] T. Darrell and A. Pentland, “Space-Time Gesture,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 335-340, 1993.
[19] L.W. Campbell and A.F. Bobick, “Recognition of Human Body Motion Using Phase Space Constraints,” Proc. Int'l Conf. Computer Vision, p. 624, 1995.
[20] C.R. Wren, A. Azarbayejani, T. Darrell, and A.P. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780-785, July 1997.
[21] S.X. Ju, M.J. Black, and Y. Yacoob, “Cardboard People: A Parameterized Model of Articulated Motion,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 38-44, 1996.
[22] G. Shakhnarovich, J.W. Fisher, and T. Darrell, “Face Recognition from Long-Term Observations,” Proc. European Conf. Computer Vision, pp. 851-865, 2002.
[23] S.X. Ju, M.J. Black, and Y. Yacoob, “Cardboard People: A Parameterized Model of Articulated Image Motion,” Proc. IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 38-44, 1996.
[24] Y. Yacoob and M.J. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.
[25] G. Mori and J. Malik, “Estimating Human Body Configurations Using Shape Context Matching,” Proc. European Conf. Computer Vision, pp. 666-680, 2002.
[26] K. Grauman, G. Shakhnarovich, and T. Darrell, “Inferring 3D Structure with a Statistical Image-Based Shape Model,” Proc. Int'l Conf. Computer Vision, p. 641, 2003.
[27] G. Shakhnarovich, P. Viola, and T. Darrell, “Fast Pose Estimation with Parameter-Sensitive Hashing,” Proc. Int'l Conf. Computer Vision, pp. 750-759, 2003.
[28] R. Rosales, V. Athitsos, and S. Sclaroff, “3D Hand Pose Reconstruction Using Specialized Mappings,” Proc. Int'l Conf. Computer Vision, pp. 378-387, 2001.
[29] A. Agarwal and B. Triggs, “3D Human Pose from Silhuettes by Relevance Vector Regression,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 882-888, 2004.
[30] C. Sminchisescu, A. Kanaujia, Z. Li, and D.N. Metaxas, “Discriminative Density Propagation for 3D Human Motion Estimation,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 390-397, 2005.
[31] M. Brand, “Shadow Puppetry,” Proc. Int'l Conf. Computer Vision, vol. 2, pp. 1237-1244, 1999.
[32] D. Ormoneit, H. Sidenbladh, M.J. Black, and T. Hastie, “Learning and Tracking Cyclic Human Motion,” Proc. Advances in Neural Information Processing, pp. 894-900, 2000.
[33] C. Sminchisescu and A. Jepson, “Generative Modeling of Continuous Non-Linearly Embedded Visual Inference,” Proc. Int'l Conf. Machine Learning, pp. 140-147, 2004.
[34] A. Rahimi, B. Recht, and T. Darrell, “Learning Appearance Manifolds from Video,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 868-875, 2005.
[35] R. Urtasun, D.J. Fleet, and P. Fua, “3D People Tracking with Gaussian Process Dynamical Models,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 238-245, 2006.
[36] J. Wang, D.J. Fleet, and A. Hertzmann, “Gaussian Process Dynamical Models,” Proc. Advances in Neural Information Processing, 2005.
[37] S. Roweis and Z. Ghahramani, “An EM Algorithm for Identification of Nonlinear Dynamical Systems,” Kalman Filtering and Neural Networks, S. Haykin, ed., 2003.
[38] C.M. Christoudias and T. Darrell, “On Modelling Nonlinear Shape-and-Texture Appearance Manifolds,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 1067-1074, 2005.
[39] H. Murase and S. Nayar, “Visual Learning and Recognition of 3D Objects from Appearance,” Int'l J. Computer Vision, vol. 14, no. 1, pp. 5-24, 1995.
[40] R. Fablet and M.J. Black, “Automatic Detection and Tracking of Human Motion with a View-Based Representation,” Proc. European Conf. Computer Vision), pp. 476-491, 2002.
[41] R. Goldenberg, R. Kimmel, E. Rivlin, and M. Rudzsky, “'Dynamism of a Dog on a Leash' or Behavior Classification by Eigen-Decomposition of Periodic Motions,” Proc. European Conf. Computer Vision, pp. 461-475, May 2002.
[42] K. Toyama and A. Blake, “Probabilistic Tracking in a Metric Space,” Proc. Int'l Conf. Computer Vision, pp. 50-59, 2001.
[43] B.J. Frey and N. Jojic, “Learning Graphical Models of Images, Videos and Their Spatial Transformation,” Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp. 184-191, 2000.
[44] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[45] J. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[46] J.B. Tenenbaum and W.T. Freeman, “Separating Style and Content with Bilinear Models,” Neural Computation, vol. 12, pp. 1247-1283, 2000.
[47] M.A.O. Vasilescu and D. Terzopoulos, “Multilinear Analysis of Image Ensembles: Tensorfaces,” Proc. European Conf. Computer Vision, pp. 447-460, 2002.
[48] J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, 1988.
[49] L.D. Lathauwer, B. de Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253-1278, 2000.
[50] M.A.O. Vasilescu, “Human Motion Signatures: Analysis, Synthesis, Recognition,” Proc. Int'l Conf. Pattern Recognition, vol. 3, pp.456-460, 2002.
[51] A. Elgammal and C.-S. Lee, “Separating Style and Content on a Nonlinear Manifold,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 478-485, 2004.
[52] N.D. Lawrence, “Gaussian Process Models for Visualisation of High Dimensional Data,” Proc. Advances in Neural Information Processing, 2004.
[53] J.G. Silva, J.S. Marques, and J.M. Lemos, “Non-Linear Dimension Reduction with Tangent Bundle Approximation,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. iv/85-iv/88, 2005.
[54] M. Brand and A. Hertzmann, “Style Machines,” Proc. ACM SIGGRAPH '00, pp. 183-192, 2000.
[55] J.H. Ham, D.D. Lee, and L.K. Saul, “Learning High Dimensional Correspondences from Low Dimensional Manifolds,” Proc. Int'l Conf. Machine Learning Workshop the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 34-41, 2003.
[56] A.P. Shon, K. Grochow, A. Hertzmann, and R. Rao, “Learning Shared Latent Structure for Image Synthesis and Robotic Imitation,” Proc. Advances in Neural Information Processing, pp. 1233-1240, 2006.
[57] A. Gray, Modern Differential Geometry of Curves and Surfaces with Mathematica, second ed. CRC Press, 1997.
[58] G.S. Kimeldorf and G. Wahba, “A Correspondence between Bayesian Estimation on Stochastic Processes and Smoothing by Splines,” The Annals of Math. Statistics, vol. 41, pp. 495-502, 1970.
[59] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, 2002.
[60] T. Poggio and F. Girosi, “Networks for Approximation and Learning,” Proc. IEEE, vol. 78, no. 9, pp. 1481-1497, 1990.
[61] A. Elgammal, V. Shet, Y. Yacoob, and L.S. Davis, “Gesture Recognition Using a Probabilistic Framework for Pose Matching,” Proc. Seventh Int'l Conf. Control, Automation, Robotics and Vision, 2002.
[62] L. Sigal and M.J. Black, “Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion,” Technical Report CS-06-08, Brown Univ., 2006.
[63] A. Elgammal, D. Harwood, and L.S. Davis, “Non-Parametric Model for Background Subtraction,” Proc. European Conf. Computer Vision, pp. 751-767, 2000.
[64] C.-S. Lee and A. Elgammal, “Simultaneous Inference of View and Body Pose Using Torus Manifolds,” Proc. Int'l Conf. Pattern Recognition, pp. 489-494, 2006.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool