This Article 
 Bibliographic References 
 Add to: 
Unsupervised Learning of Human Motion
July 2003 (vol. 25 no. 7)
pp. 814-827

Abstract—An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful “foreground” features as well as features that arise from irrelevant background clutter—the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences.

[1] Y. Amit and A. Kong, Graphical Templates for Model Registration IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 3, pp. 225-236, 1996.
[2] U. Bertele and F. Brioschi, "Nonserial dynamic programming," Mathematics in Science and Engineering series, vol. 91, Academic Press, 1972.
[3] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[4] A. Blake and M. Isard, ”Three-Dimensional Position and Shape Input Using Video Tracking of Hands and Lips,” Proc. SIGGRAPH '94, pp. 185-192, 1994.
[5] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 8–15, June 1998.
[6] D. Chickering, D. Geiger, and D. Heckerman, Learning Bayesian Networks Is NP-Hard technical report, Microsoft Research, MSR-TR-94-17, 1994.
[7] C.K. Chow and C.N. Liu,"Approximating discrete probability distributions with dependence trees," IEEE Trans. Information Theory, vol. 14, no. 3, pp. 462-467, May 1968.
[8] T.H. Cormen,C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms.Cambridge, Mass.: MIT Press/McGraw-Hill, 1990.
[9] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[10] A. Dempster, N. Laird, and D. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[11] C. Fowlkes, Labeling Human Motion Using Mixtures of Trees Univ. of California at Berkeley, personal communication, 2001.
[12] N. Friedman and M. Goldszmidt, Learning Bayesian Networks from Data AAAI 1998 Tutorial, /, 1998.
[13] N. Friedman and D. Koller, Being Bayesian about Network Structure Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp. 201-210, 2000.
[14] D.M. Gavrila, “The Visual Analysis of Human Movement: A Survey,” Computer Vision and Image Understanding, vol. 73, no. 1, Jan. 1999.
[15] L. Goncalves, E. DiBernardom, E. Ursella, and P. Perona, “Monocular Tracking of The Human Arm in 3D,” Proc. Fifth Int'l Conf. Computer Vision, pp. 764–770, June 1995.
[16] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4 - a Real Time System for Detection and Tracking People and their Parts,” Proc. Third Face and Gesture Recognition Conf., pp. 222-227, 1998.
[17] S. Ioffe and D. Forsyth, Human Tracking with Mixtures of Trees Proc. Int'l Conf. Computer Vision, pp. 690-695, July 2001.
[18] F. Jensen, An Introduction to Bayesian Neworks. Springer Verlag, 1996.
[19] G. Johansson, Visual Perception of Biological Motion and a Model for Its Analysis Perception and Psychophysics, vol. 14, pp. 201-211, 1973.
[20] Learning in Graphical Models, M.I. Jordan, ed. MIT Press, 1999.
[21] M.I. Jordan, An Introduction to Graphical Models. to be published.
[22] M. Meila and M. I. Jordan, Learning with Mixtures of Trees J. Machine Learning Research, vol. 1, pp. 1-48, 2000.
[23] R. Polana and R.C. Nelson, Detecting Activities Proc. DARPA Image Understanding Workshop, pp. 569-574, 1993.
[24] J.M. Rehg and T. Kanade, Visual Tracking of High DOF Articulated Structures: An Application to Human Hand Tracking Proc. European Conf. Computer Vision, vol. 2, pp. 35-46, 1994.
[25] K. Rohr, Incremental Recognition of Pedestrians from Image Sequences Computer Vision and Pattern Recognition, pp. 8-13, 1993.
[26] Y. Song, X. Feng, and P. Perona, Towards Detection of Human Motion Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 810-817, 2000.
[27] Y. Song, L. Goncalves, E. Di Bernardo, and P. Perona, Monocular Perception of Biological Motion in Johansson Displays Computer Vision and Image Understanding, vol. 81, pp. 303-327, 2001.
[28] Y. Song, A Probabilistic Approach to Human Motion Detection and Labeling PhD thesis, Caltech, 2003.
[29] N. Srebro, Maximum Likelihood Bounded Tree-Width Markov Networks Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp. 504-511, 2001.
[30] C. Tomasi and T. Kanade, Detection and Tracking of Point Features Technical Report CMU-CS-91-132, Carnegie Mellon Univ., 1991.
[31] S. Wachter and H. Nagel, “Tracking Persons in Monocular Image Sequences,” Computer Vision and Image Understanding, vol. 74, no. 3, pp. 174-192, 1999.
[32] M. Weber, M. Welling, and P. Perona, Unsupervised Learning of Models for Recognition Proc. European Conf. Computer Vision, vol. 1, pp. 18-32, June/July 2000.
[33] M. Weber, Unsupervised Learning of Models for Object Recognition PhD thesis, Caltech, May 2000.
[34] M. Welling, EM-Algorithm Class Notes at California Inst. of Tech nology, 2000.
[35] Y. Yacoob and M.J. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.

Index Terms:
Unsupervised learning, human motion, decomposable triangulated graph, probabilistic models, greedy search, EM algorithm, mixture models.
Yang Song, Luis Goncalves, Pietro Perona, "Unsupervised Learning of Human Motion," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 814-827, July 2003, doi:10.1109/TPAMI.2003.1206511
Usage of this product signifies your acceptance of the Terms of Use.