This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition
August 2002 (vol. 24 no. 8)
pp. 1061-1074

We present an algorithm for extracting and classifying two-dimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain two-view correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive image pairs are concatenated to obtain pixel-level motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a time-delay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories.

[1] J.K. Aggarwal and Q. Cai, “Human Motion Analysis: A Review,” Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428-440, 1999.
[2] J.K. Aggarwal and N. Nandhakumar, “On the Computation of Motion from Sequences of Images: A Review,” Proc. IEEE, vol. 76, no. 8, pp. 917-935, 1988.
[3] N. Ahuja, “A Transform for Multiscale Image Segmentation by Integrated Edge and Region Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 12, pp. 1211-1235, 1996.
[4] S. Barnard and W. Thompson, “Disparity Analysis of Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 4, pp. 333-340, 1980.
[5] M.J. Black and A.D. Jepson, “A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gesture and Expressions,” Proc. Fifth European Conf. Computer Vision, pp. 909-924, 1998.
[6] A.F. Bobick and A.D. Wilson, “A State-Based Approach to the Representation and Recognition of Gesture,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.
[7] Computer Vision for Human-Machine Interaction, R. Cipolla and A. Pentland, eds., Cambridge Univ. Press, 1998.
[8] J.L. Crowley and F. Beard, “Multimodal Tracking of Faces for Video Communications,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 640-645, 1997.
[9] Y. Cui and J. Weng, “A Learning-Based Prediction-and-Verification Segmentation Scheme for Hand Sign Sequence,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 8, pp. 798-804, Aug. 1999.
[10] T. Darrell and A. Pentland, “Space-Time Gestures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 335-340, 1993.
[11] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.
[12] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: John Wiley&Sons, 2001.
[13] S.S. Fels and G.E. Hinton, “Glove-Talk: A Neural Network Interface between a Data-Glove and a Speech Synthesizer,” IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 2-8, Jan. 1993.
[14] S.S. Fels and G.E. Hinton, “Glove-Talk II: A Neural Network Interface which Maps Gestures to Parallel Format Speech Synthesizer Controls,” IEEE Trans. Neural Networks, vol. 9, no. 1, pp. 205-212, 1997.
[15] T. Hastie and W. Stuetzle, “Principal curves,” J. Am. Statistical Assoc., vol. 84, no. 406, pp. 502-516, 1989.
[16] S. Haynes and R. Jain, “Detection of Moving Edges,” Computer Vision, Graphics, and Image Understanding, vol. 21, no. 3, pp. 345-367, 1980.
[17] F. Heitz and P. Bouthemy, Multimodal Estimation of Discontinuous Optical Flow Using Markov Random Fields IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 12, pp. 1217-1232, Dec. 1993.
[18] J. Hopcroft and R. Tarjan, “Isomorphism of Planar Graphs,” Complexity of Computer Computations, R. Miller and J. Thatcher, eds., pp. 131-152, New York: Plenum Press, 1972.
[19] B. Horn and B. Schunck, “Determining Optical Flow,” Artificial Intelligence, vol. 17, nos. 1-3, pp. 185-203, 1981.
[20] M. Isard and A. Blake, “Condensation-Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, pp. 5-28, 1998.
[21] G. Johansson, “Visual Perception of Biological Motion and a Model for Its Analysis,” Perception and Psychophysics, vol. 73, no. 2, pp. 201-211, 1973.
[22] H.-K. Lee and J.H. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 961-973, Oct. 1999.
[23] A. Leonardis,A. Gupta,, and R. Bajcsy,“Segmentation as the search for the best description of the image in terms of primitives,” Proc. Int’l Conf. Computer Vision, pp. 121-125, 1990.
[24] D. Marshall, G. Lukacs, and R. Martin, “Robust Segmentation of Primitives from Range Data in the Presence of Geometric Degeneracy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 304-314, Mar. 2001.
[25] H. Nagel, “Displacement Vectors Derived from Second-Order Intensity Variations in Image Sequences,” Computer Vision, Graphics, and Image Understanding, vol. 21, no. 1, pp. 85-117, 1983.
[26] V.I. Pavlovic, R. Sharman, and T.S. Huang, "Visual Interpretation for Human-Computer Interaction: A Review," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 677-695, July 1997.
[27] H. Poizner, U. Bellugi, and V. Lutes-Driscoll, “Perception of American Sign Language in Dynamic Point-Light Displays,” J. Experimental Psychology: Human, Perception and Performance, vol. 7, no. 2, pp. 430-440, 1981.
[28] K. Price and R. Reddy, “Matching Segments of Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no. 1, pp. 110-116, 1979.
[29] R.A. Render and H.F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Rev., vol. 26, no. 2, pp. 195-239, 1984.
[30] J. Schlenzig, E. Hunter, and R. Jain, “Vision Based Hand Gesture Interpretation Using Recursive Estimation,” Proc. 28th Asilomar Conf. Signals, Systems, and Computers, 1994.
[31] M. Shah and R. Jain, Motion-Based Recognition. Kluwer Academic, 1997.
[32] J.M. Siskind and Q. Morris, “A Maximum-Likelihood Approach to Visual Event Classification,” Proc. Fourth European Conf. Computer Vision, pp. 347-360, 1996.
[33] G. Sperling, M. Landy, Y. Cohen, and M. Pavel, “Intelligible Encoding of ASL Image Sequences at Extremely Information Rates,” Computer Vision, Graphics, and Image Understanding, vol. 31, no. 2, pp. 335-391, 1985.
[34] T. Starner, J. Weaver, and A. Pentland, "Real-Time American Sign Language Recognition Using Desk and Wearable Computer-Based Video," Pattern Analysis and Machine Vision, Dec. 1998, pp. 1371-1375.
[35] T.E. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. First Int'l Workshop Automatic Face and Gesture Recognition, pp. 189-194, 1995.
[36] S. Sull and N. Ahuja, “Integrated 3D Analysis and Analysis-Guided Synthesis of Flight Image Sequences,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 4, pp. 357-372, Apr. 1994.
[37] M. Tabb, “Multiscale Structure Detection and Its Application to Image Segmentation and Motion Analysis,” PhD thesis, Univ. of Illinois at Urbana-Champaign, 1996.
[38] M. Tabb and N. Ahuja, “Multiscale Image Segmentation by Integrated Edge and Region Detection,” IEEE Trans. Image Processing, vol. 6, no. 5, pp. 642-655, 1997.
[39] A. Verri, F. Girosi, and V. Torre, “Differential Techniques for Optical Flow,” J. Optical Soc. Am., vol. 7, no. 5, pp. 912-922, 1990.
[40] C. Vogler and D. Metaxas, “ASL Recognition Based on a Coupling between HMMs and 3D Motion Analysis,” Proc. Sixth IEEE Int'l Conf. Computer Vision, pp. 363-369, 1998.
[41] C. Vogler and D. Metaxas, “A Framework for Recognizing the Simultaneous Aspects of American Sign Language,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 358-384, 2001.
[42] A. Waibel,T. Hanazawa,G. Hinton,K. Shikano,, and K. Lang,“Phoneme recognition using time-delay neural networks,” IEEE Trans. ASSP, vol. 37, no. 3, Mar. 1989.
[43] Y. Weiss and E. Adelson, “A Unified Mixture Framework for Motion Segmentation: Incorporating Spatial Coherence and Estimating the Number of Models,” Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, pp. 321-326, 1996.
[44] J. Weng,N. Ahuja,, and T. S. Huang,“Matching two perspective views,” Trans. Pattern Analysis and Machine Intelligence Intell., vol. 14, no. 8, pp. 806-825, 1992.
[45] A. Wilson and A.F. Bobick, "Parametric Hidden Markov Models for Gesture Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, 1999, pp. 884—900.
[46] M.-H. Yang and N. Ahuja, Face Detection and Hand Gesture Recognition for Human-Computer Interaction. Kluwer Academic Publishers, 2001.
[47] M. Zhao, F.K.H. Quek, and X. Wu, “RIEVL: Recursive Induction Learning in Hand Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1174-1185, Nov. 1998.

Index Terms:
Motion segmentation, motion analysis, motion trajectory, American Sign Language, hand gesture recognition, time-delay neural network.
Citation:
Ming-Hsuan Yang, Narendra Ahuja, Mark Tabb, "Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1061-1074, Aug. 2002, doi:10.1109/TPAMI.2002.1023803
Usage of this product signifies your acceptance of the Terms of Use.