The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2009 vol.31)
pp: 1685-1699
Vassilis Athitsos , University of Texas at Arlington, Arlington
Jonathan Alon , Boston University, Boston
Stan Sclaroff , Boston University, Boston
ABSTRACT
Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American Sign Language (ASL).
INDEX TERMS
Gesture recognition, gesture spotting, human motion analysis, dynamic time warping, continuous dynamic programming.
CITATION
Vassilis Athitsos, Jonathan Alon, Stan Sclaroff, "A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 9, pp. 1685-1699, September 2009, doi:10.1109/TPAMI.2008.203
REFERENCES
[1] A. Corradini, “Dynamic Time Warping for Off-Line Recognition of a Small Gesture Vocabulary,” Proc. IEEE ICCV Workshop Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pp. 82-89, 2001.
[2] R. Cutler and M. Turk, “View-Based Interpretation of Real-Time Optical Flow for Gesture Recognition,” Proc. Third IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 416-421, 1998.
[3] T. Darrell and A. Pentland, “Space-Time Gestures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 335-340, 1993.
[4] M. Gandy, T. Starner, J. Auxier, and D. Ashbrook, “The Gesture Pendant: A Self-Illuminating, Wearable, Infrared Computer Vision System for Home Automation Control, and Medical Monitoring,” Proc. Fourth Int'l Symp. Wearable Computers, pp. 87-94, 2000.
[5] K. Oka, Y. Sato, and H. Koike, “Real-Time Fingertip Tracking and Gesture Recognition,” IEEE Computer Graphics and Applications, vol. 22, no. 6, pp. 64-71, Nov./Dec. 2002.
[6] T. Starner, J. Weaver, and A. Pentland, “Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp.1371-1375, Dec. 1998.
[7] M.H. Yang, N. Ahuja, and M. Tabb, “Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1061-1074, Aug. 2002.
[8] V. Pavlovic, R. Sharma, and T. Huang, “Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 677-695, July 1997.
[9] Y. Cui and J. Weng, “Appearance-Based Hand Sign Recognition from Intensity Image Sequences,” Computer Vision and Image Understanding, vol. 78, no. 2, pp. 157-176, May 2000.
[10] E. Ong and R. Bowden, “A Boosted Classifier Tree for Hand Shape Detection,” Proc. Sixth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 889-894, 2004.
[11] M. Isard and A. Blake, “CONDENSATION—Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[12] M. Kolsch and M. Turk, “Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration,” Proc. IEEE Workshop Real-Time Vision for Human-Computer Interaction, pp. 158-165, 2004.
[13] N. Stefanov, A. Galata, and R. Hubbold, “Real-Time Hand Tracking with Variable-Length Markov Models of Behaviour,” Real Time Vision for Human-Computer Interaction, vol. III, p. 73, 2005.
[14] B. Stenger, A. Thayananthan, P. Torr, and R. Cipolla, “Filtering Using a Tree-Based Estimator,” Proc. Ninth IEEE Int'l Conf. Computer Vision, pp. 1063-1070, 2003.
[15] E. Sudderth, M. Mandel, W. Freeman, and A. Willsky, “Visual Hand Tracking Using Nonparametric Belief Propagation,” Proc. IEEE CVPR Workshop Generative Model Based Vision, 2004.
[16] F. Chen, C. Fu, and C. Huang, “Hand Gesture Recognition Using a Real-Time Tracking Method and Hidden Markov Models,” Image and Video Computing, vol. 21, no. 8, pp. 745-758, Aug. 2003.
[17] J. Martin, V. Devin, and J. Crowley, “Active Hand Tracking,” Proc. Third IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp.573-578, 1998.
[18] Y. Sato and T. Kobayashi, “Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and Its Application to Mobile-Robot-Oriented Gesture Recognition,” Proc. 16th Int'l Conf. Pattern Recognition, vol. II, pp. 515-519, 2002.
[19] J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, “Simultaneous Localization and Recognition of Dynamic Hand Gestures,” Proc. IEEE Workshop Motion and Video Computing, vol. II, pp. 254-260, 2005.
[20] A. Bobick and J. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257-267, Mar. 2001.
[21] P. Dreuw, T. Deselaers, D. Keysers, and H. Ney, “Modeling Image Variability in Appearance-Based Gesture Recognition,” Proc. ECCV Workshop Statistical Methods in Multi-Image and Video Processing, pp. 7-18, 2006.
[22] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2247-2253, Dec. 2007.
[23] S. Nayak, S. Sarkar, and B. Loeding, “Unsupervised Modeling of Signs Embedded in Continuous Sentences,” Proc. IEEE Workshop Vision for Human-Computer Interaction, 2005.
[24] Y. Ke, R. Sukthankar, and M. Hebert, “Efficient Visual Event Detection Using Volumetric Features,” Proc. 10th IEEE Int'l Conf. Computer Vision, vol. 1, pp. 166-173, 2005.
[25] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. I, pp. 511-518, 2001.
[26] J. Alon, V. Athitsos, and S. Sclaroff, “Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning,” Proc. IEEE ICCV Workshop Human Computer Interaction, pp. 189-198, 2005.
[27] H. Lee and J. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 961-973, Oct. 1999.
[28] P. Morguet and M. Lang, “Spotting Dynamic Hand Gestures in Video Image Sequences Using Hidden Markov Models,” Proc. IEEE Int'l Conf. Image Processing, pp. 193-197, 1998.
[29] R. Oka, “Spotting Method for Classification of Real World Data,” The Computer J., vol. 41, no. 8, pp. 559-565, July 1998.
[30] H. Yoon, J. Soh, Y. Bae, and H. Yang, “Hand Gesture Recognition Using Combined Features of Location, Angle and Velocity,” Pattern Recognition, vol. 34, no. 7, pp. 1491-1501, July 2001.
[31] Y. Zhu, G. Xu, and D. Kriegman, “A Real-Time Approach to the Spotting, Representation, and Recognition of Hand Gestures for Human-Computer Interaction,” Computer Vision and Image Understanding, vol. 85, no. 3, pp. 189-208, Mar. 2002.
[32] H. Kang, C. Lee, and K. Jung, “Recognition-Based Gesture Spotting in Video Games,” Pattern Recognition Letters, vol. 25, no. 15, pp. 1701-1714, Nov. 2004.
[33] K. Kahol, P. Tripathi, and S. Panchanathan, “Automated Gesture Segmentation from Dance Sequences,” Proc. Sixth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 883-888, 2004.
[34] T. Darrell, I. Essa, and A. Pentland, “Task-Specific Gesture Analysis in Real-Time Using Interpolated Views,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 12, pp. 1236-1242, Dec. 1996.
[35] J.B. Kruskall and M. Liberman, “The Symmetric Time Warping Algorithm: From Continuous to Discrete,” Time Warps, String Edits and Macromolecules, J.B. Kruskal and D. Sankoff, eds., pp. 125-162, Addison-Wesley, 1983.
[36] M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for Complex Action Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 994-999, 1997.
[37] C. Vogler and D. Metaxas, “Parallel Hidden Markov Models for American Sign Language Recognition,” Proc. Seventh IEEE Int'l Conf. Computer Vision, pp. 116-122, 1999.
[38] A. Wilson and A. Bobick, “Parametric Hidden Markov Models for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sept. 1999.
[39] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning, pp. 282-289, 2001.
[40] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell, “Hidden Conditional Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007.
[41] H. Yang, A. Park, and S. Lee, “Robust Spotting of Key Gestures from Whole Body Motion Sequence,” Proc. Sixth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 231-236, 2006.
[42] F. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997.
[43] C. Wang, W. Gao, and S. Shan, “An Approach Based on Phonemes to Large Vocabulary Chinese Sign Language Recognition,” Proc. Fifth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp.393-398, 2002.
[44] B. Bauer, H. Hienz, and K. Kraiss, “Video-Based Continuous Sign Language Recognition Using Statistical Methods,” Proc. 15th Int'l Conf. Pattern Recognition, vol. II, pp. 463-466, 2000.
[45] R. Bowden, D. Windridge, T. Kadir, A. Zisserman, and M. Brady, “A Linguistic Feature Vector for the Visual Interpretation of Sign Language,” Proc. Eighth European Conf. Computer Vision, vol. I, pp.390-401, 2004.
[46] C. Vogler, “American Sign Language Recognition: Reducing the Complexity of the Task with Phoneme-Based Modeling and Parallel Hidden Markov Models,” PhD thesis, Univ. of Pennsylvania, 2003.
[47] M. Jones and J. Rehg, “Statistical Color Models with Application to Skin Detection,” Int'l J. Computer Vision, vol. 46, no. 1, pp. 81-96, Jan. 2002.
[48] H. Rowley, S. Baluja, and T. Kanade, “Neural Network-Based Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, Jan. 1998.
[49] Q. Yuan, S. Sclaroff, and V. Athitsos, “Automatic 2D Hand Tracking in Video Sequences,” Proc. IEEE Workshop Applications of Computer Vision, pp. 250-256, 2005.
[50] C. Bahlmann and H. Burkhardt, “The Writer Independent Online Handwriting Recognition System Frog on Hand and Cluster Generative Statistical Dynamic Time Warping,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 299-310, Mar. 2004.
[51] L.E. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” The Annals of Math. Statistics, vol. 41, no. 1, pp. 164-171, 1970.
[52] M. Ostendorf, V. Digalakis, and O. Kimball, “From HMMs to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 4, no. 5, pp. 360-378, 1996.
[53] E. Keogh, “Exact Indexing of Dynamic Time Warping,” Proc. 28th Int'l Conf. Very Large Data Bases, pp. 406-417, 2002.
[54] J. Alon, “Spatiotemporal Gesture Segmentation,” PhD dissertation, Technical Report BU-CS-2006-024, Dept. of Computer Science, Boston Univ., 2006.
[55] G. Williams, “A Study of the Evaluation of Confidence Measures in Automatic Speech Recognition,” Technical Report CS-98-02, Univ. of Sheffield, 1998.
[56] R. Battison, Lexical Borrowing in American Sign Language. Linstok Press, 1978.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool