loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tracking People by Learning Their Appearance
January 2007 (vol. 29 no. 1)
pp. 65-81
An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard because people can move fast and unpredictably, can appear in a variety of poses and clothes, and are often surrounded by limb-like clutter. We develop a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame ("tracking by model-building and detection”). We develop two algorithms that build models; one bottom-up approach groups together candidate body parts found throughout a sequence. We also describe a top-down approach that automatically builds people-models by detecting convenient key poses within a sequence. We finally show that building a discriminative model of appearance is quite helpful since it exploits structure in a background (without background-subtraction). We demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity, a feature-length film ("Run Lola Run”), and legacy sports footage (from the 2002 World Series and 1998 Winter Olympics). Experiments suggest that our system 1) can count distinct individuals, 2) can identify and track them, 3) can recover when it loses track, for example, if individuals are occluded or briefly leave the view, 4) can identify body configuration accurately, and 5) is not dependent on particular models of human motion.

[1] D. Ramanan, “Tracking People and Recognizing Their Activities,” PhD dissertation, Univ. of California, Berkeley, 2005.
[2] D.M. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding: CVIU, vol. 73, no. 1, pp. 82-98, 1999.
[3] D. Ramanan and D.A. Forsyth, “Automatic Annotation of Everyday Movements,” Proc. Neural Information Processing Systems, 2003.
[4] D. Hogg, “Model Based Vision: A Program to See a Walking Person,” Image and Vision Computing, vol. 1, no. 1, pp. 5-20, 1983.
[5] S. Ioffe and D.A. Forsyth, “Human Tracking with Mixtures of Trees,” Proc. Int'l Conf. Computer Vision, 2001.
[6] J. O'Rourke and N. Badler, “Model-Based Image Analysis of Human Motion Using Constraint Propagation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, pp. 522-546, 1980.
[7] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” Computer Vision and Pattern Recognition, pp. 8-15, 1998.
[8] D. Gavrila and L. Davis, “3D Model-Based Tracking of Humans in Action: A Multi-View Approach,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 73-80, 1996.
[9] K. Rohr, “Incremental Recognition of Pedestrians from Image Sequences,” Computer Vision and Pattern Recognition, pp. 9-13, 1993.
[10] H. Sidenbladh, M.J. Black, and D.J. Fleet, “Stochastic Tracking of 3D Human Figures Using 2D Image Motion,” Proc. European Conf. Computer Vision, 2000.
[11] A. Blake and M. Isard, “Condensation—Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[12] J. Deutscher, A. Blake, and I. Reid, “Articulated Body Motion Capture by Annealed Particle Filtering,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. II:126-133, 2000.
[13] K. Toyama and A. Blake, “Probabilistic Tracking with Exemplars in a Metric Space,” Int'l J. Computer Vision, vol. 48, no. 1, pp. 9-19, 2002.
[14] C. Sminchisescu and B. Triggs, “Covariance Scaled Sampling for Monocular 3D Body Tracking,” Computer Vision and Pattern Recognition, 2001.
[15] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard, “Tracking Loose-Limbed People,” Computer Vision and Pattern Recognition, 2004.
[16] H. Sidenbladh, M.J. Black, and L. Sigal, “Implicit Probabilistic Models of Human Motion for Synthesis and Tracking,” Proc. European Conf. Computer Vision, 2000.
[17] V. Pavlovic, J. Rehg, T.-J. Cham, and K. Murphy, “A Dynamic Bayesian Network Approach to Figure Tracking Using Learned Dynamic Models,” Proc. Int'l Conf. Computer Vision, pp. 94-101, 1999.
[18] A. Agarwal and B. Triggs, “Tracking Articulated Motion with Piecewise Learned Dynamic Models,” Proc. European Conf. Computer Vision, 2004.
[19] G. Mori and J. Malik, “Estimating Human Body Configurations Using Shape Context Matching,” Proc. European Conf. Computer Vision, 2002.
[20] J. Sullivan and S. Carlsson, “Recognizing and Tracking Human Action,” Proc. European Conf. Computer Vision, 2002.
[21] D.M. Gavrila, “Pedestrian Detection from a Moving Vehicle,” Proc. European Conf. Computer Vision, pp. 37-49, 2000.
[22] Y. Song, X. Feng, and P. Perona, “Towards Detection of Human Motion,” Computer Vision and Pattern Recognition, pp. 810-17, 2000.
[23] P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Proc. Int'l Conf. Computer Vision, 2003.
[24] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems. Artech House, 1999.
[25] J. Rehg and T. Kanade, “Model-Based Tracking of Self-Occluding Articulated Objects,” Proc. Int'l Conf. Computer Vision, pp. 612-617, 1995.
[26] A. Jepson, D. Fleet, and T. El-Maraghi, “Robust Online Appearance Models for Visual Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1296-1311, Oct. 2003.
[27] M.-H.Y. Gang Hua and Y. Wu, “Learning to Estimate Human Pose with Data Driven Belief Propagation,” Computer Vision and Pattern Recognition, 2005.
[28] M. Lee and I. Cohen, “Proposal Maps Driven MCMC for Estimating Human Body Pose in Static Images,” Computer Vision and Pattern Recognition, 2004.
[29] T. Roberts, S.J. McKenna, and I.W. Ricketts, “Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations,” Proc. European Conf. Computer Vision, 2004.
[30] R. Ronfard, C. Schmid, and B. Triggs, “Learning to Parse Picture of People,” Proc. European Conf. Computer Vision, 2002.
[31] G. Mori, X. Ren, A. Efros, and J. Malik, “Recovering Human Body Configurations: Combining Segmentation and Recognition,” Computer Vision and Pattern Recognition, 2004.
[32] S. Ioffe and D. Forsyth, “Probabilistic Methods for Finding People,” Int'l J. Computer Vision, 2001.
[33] M.A. Fischler and R.A. Elschlager, “The Representation and Matching of Pictorial Structures,” IEEE Trans. Computers, vol. 22, no. 1, pp. 67-92, Jan. 1973.
[34] P.F. Felzenszwalb and D.P. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 61, no. 1, Jan. 2005.
[35] D. Ramanan and D.A. Forsyth, “Finding and Tracking People from the Bottom Up,” Computer Vision and Pattern Recognition, 2003.
[36] D. Ramanan, D. Forsyth, and A. Zisserman, “Strike a Pose: Tracking People by Finding Stylized Poses,” Computer Vision and Pattern Recognition, June 2005.
[37] J.S. Yedidia, W.T. Freeman, and Y. Weiss, “Understanding Belief Propagation and Its Generalizations,” Proc. Int'l Joint Conf. Artificial Intelligence, Aug. 2001.
[38] M. Isard, “Pampas: Real-Valued Graphical Models for Computer Vision,” Computer Vision and Pattern Recognition, 2003.
[39] E. Sudderth, A. Ihler, W. Freeman, and A. Willsky, “Nonparametric Belief Propagation,” Computer Vision and Pattern Recognition, 2003.
[40] W.T. Freeman and Y. Weiss, “On the Fixed Points of the Max-Product Algorithm,” IEEE Trans. Information Threory, 2000.
[41] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Computer Vision and Pattern Recognition, 2001.
[42] D. Comaniciu and P. Meer, “Mean Shift: A Robust Approach Toward Feature Space Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.
[43] J. Deutscher, A. Davison, and I. Reid, “Automatic Partitioning of High Dimensional Search Spaces Associated with Articulated Body Motion Capture,” Computer Vision and Pattern Recognition, 2001.
[44] A. Thayananthan, B. Stenger, P. Torr, and R. Cipolla, “Shape Context and Chamfer Matching in Cluttered Scenes,” Computer Vision and Pattern Recognition, 2003.
[45] V.N. Vapnik, Statistical Learning Theory. John Wiley and Sons, 1998.
[46] M. Kumar, P. Torr, and A. Zisserman, “Learning Layered Pictorial Structures from Video,” Proc. Indian Conf. Vision, Graphics, and Image Processing, 2004.
[47] R. Collins and Y. Liu, “On-Line Selection of Discriminitive Tracking Features,” Proc. Int'l Conf. Computer Vision, 2003.

Index Terms:
People tracking, motion capture, surveillance.
Citation:
Deva Ramanan, David A. Forsyth, Andrew Zisserman, "Tracking People by Learning Their Appearance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 65-81, Jan. 2007, doi:10.1109/TPAMI.2007.22
Usage of this product signifies your acceptance of the Terms of Use.