This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Parametric Hidden Markov Models for Gesture Recognition
September 1999 (vol. 21 no. 9)
pp. 884-900

Abstract—A new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Last, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction.

[1] A. Azarbayejani and A. Pentland, “Real-Time Self-Calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Features,” Proc. 13th Int'l Conf. Pattern Recognition, Vienna, Aug. 1996.
[2] Y. Bengio and P. Frasconi, “An Input Output HMM Architecture,” Advances in Neural Information Processing Systems 7, G. Tesauro, M.D.S. Touretzky, and T.K. Leen, ed., pp. 427-434. MIT Press, 1995.
[3] C.M. Bishop, Neural Networks for Pattern Recognition. Clarendon Press, 1995.
[4] C.M. Bishop, M. Svensen, and C.K.I. Williams, “EM Optimization of Latent-Variable Density Models,” Advances in Neural Information Processing Systems 8, M.C. Moser, D.S. Touretzky, and M.E. Hasselmo, eds., pp. 402-408. MIT Press, 1996.
[5] A.F. Bobick and A.D. Wilson, “A State-Based Approach to the Representation and Recognition of Gesture,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.
[6] A.F. Bobick and J.W. Davis, “An Appearance Based Representation of Action,” Proc. 13th Int'l Conf. Pattern Recognition, Aug. 1996.
[7] C. Bregler and S.M. Omohundro, “Surface Learning with Applications to Lipreading,” Advances in Neural Information Processing Systems 6, pp. 43-50, 1994.
[8] L. Brieman, Statistics. Boston: Houghton Mifflin, 1973.
[9] L.W. Campbell, D.A. Becker, A. Azarbayejani, A.F. Bobick, and A. Pentland, "Invariant Features for 3D Gesture Recognition," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 157-162, Oct. 1996.
[10] L.W. Campbell and A.F. Bobick, “Recognition of Human Body Motion Using Phase Space Constraints,” Proc. Int'l Conf. Computer Vision, 1995.
[11] J. Cassell and D. McNeill, “Gesture and the Poetics of Prose,” Poetics Today, vol. 12, no. 3, pp. 375-404, 1991.
[12] T. Darrell and A. Pentland, “Space-Time Gestures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 335-340, 1993.
[13] T. Darrell, P. Maes, B. Blumberg, and A. Pentland, “A Novel Environment for Situated Vision and Behavior,” Proc. Computer Vision and Pattern Recognition '94 Workshop Visual Behaviors, pp. 68-72, Seattle, Wash., June 1994.
[14] M.J.F. Gales, “Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition,” CUED/F-INFENG Technical Report 291, Cambridge Univ. Eng. Dept., 1997.
[15] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.
[16] F. Jensen, An Introduction to Bayesian Neworks. Springer Verlag, 1996.
[17] R.E. Kahn and M.J. Swain, “Understanding People Pointing: The Perseus System,” Proc. IEEE Int'l. Symp. Computer Vision, pp. 569-574, Coral Gables, Fla., Nov. 1995.
[18] D. McNeill, Hand and Mind: What Gestures Reveal About Thought. Chicago: Univ. of Chicago Press, 1992.
[19] H. Murase and S.K. Nayar, “Visual Learning and Recognition of 3-D Objects from Appearance,” Int'l J. Computer Vision, vol. 14, pp. 5-24, 1995.
[20] S.M. Omohundro, “Family Discovery,” Advances in Neural Information Processing Systems 8, D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, eds., pp. 402-408, MIT Press, 1996.
[21] H. Poizner, E.S. Klima, U. Bellugi, and R.B. Livingston, “Motion Analysis of Grammatical Processes in a Visual-Gestural Language,” Proc. ACM SIGGRAPH/SIGART Interdisciplinary Workshop, Motion: Representation and Perception, pp. 148-171, Toronto, Apr. 1983.
[22] L.R. Rabiner and B.H. Juang, "An Introduction to Hidden Markov Models," IEEE Acoustics, Speech, and Signal Processing Magazine, vol. 3, pp. 4-16, Jan. 1986.
[23] J. Schlenzig, E. Hunter, and R. Jain, “Vision Based Hand Gesture Interpretation Using Recursive Estimation,” Proc. 28th Asilomar Conf. Signals, Systems, and Computers, 1994.
[24] T.E. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. Int'l Workshop Automatic Face- and Gesture-Recognition, Zurich, 1995.
[25] J. Tenenbaum and W. Freeman, “Separating Style and Content,” Advances in Neural Information Processing Systems 9, 1997.
[26] S.A. Teukolsky, W.H. Press, B.P. Flannery, and W.T. Vetterling, Numerical Recipes in C. Cambride, U.K.: Cambridge Univ. Press, 1991.
[27] A.D. Wilson and A.F. Bobick, “Learning Visual Behavior for Gesture Analysis,” Proc. IEEE Int'l. Symp. Computer Vision, Coral Gables, Fla., Nov. 1995.
[28] D. Wilson and A. Bobick, “Nonlinear PHMMs for the Interpretation of Parameterized Gesture,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.
[29] A.D. Wilson and A.F. Bobick, “Recognition and Interpretation of Parametric Gesture,” Proc. Int'l Conf. Computer Vision, pp. 329-336, 1998.
[30] A.D. Wilson, A.F. Bobick, and J. Cassell, “Temporal Classification of Natural Gesture and Application to Video Coding,” Proc. Computer Vision and Pattern Recognition, pp. 948-954, 1997.
[31] Y. Yacoob and M.J. Black, “Parameterized Modeling and Recognition of Activities,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 232-247, 1999.
[32] J. Yamato, H. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” Proc. 1992 IEEE Conf. Computer Vision and Pattern Recognition, pp. 379-385, 1992.

Index Terms:
Gesture recognition, hidden Markov models, expectation-maximization algorithm, time-series modeling, computer vision.
Citation:
Andrew D. Wilson, Aaron F. Bobick, "Parametric Hidden Markov Models for Gesture Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sept. 1999, doi:10.1109/34.790429
Usage of this product signifies your acceptance of the Terms of Use.