The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2009 vol.31)
pp: 1264-1277
Stan Sclaroff , Boston University, Boston
Hee-Deok Yang , Korea University, Seoul
ABSTRACT
Sign language spotting is the task of detecting and recognizing signs in a signed utterance, in a set vocabulary. The difficulty of sign language spotting is that instances of signs vary in both motion and appearance. Moreover, signs appear within a continuous gesture stream, interspersed with transitional movements between signs in a vocabulary and nonsign patterns (which include out-of-vocabulary signs, epentheses, and other movements that do not correspond to signs). In this paper, a novel method for designing threshold models in a conditional random field (CRF) model is proposed which performs an adaptive threshold for distinguishing between signs in a vocabulary and nonsign patterns. A short-sign detector, a hand appearance-based sign verification method, and a subsign reasoning method are included to further improve sign language spotting accuracy. Experiments demonstrate that our system can spot signs from continuous data with an 87.0 percent spotting rate and can recognize signs from isolated data with a 93.5 percent recognition rate versus 73.5 percent and 85.4 percent, respectively, for CRFs without a threshold model, short-sign detection, subsign reasoning, and hand appearance-based sign verification. Our system can also achieve a 15.0 percent sign error rate (SER) from continuous data and a 6.4 percent SER from isolated data versus 76.2 percent and 14.5 percent, respectively, for conventional CRFs.
INDEX TERMS
Sign language recognition, sign language spotting, conditional random field, threshold model.
CITATION
Stan Sclaroff, Hee-Deok Yang, "Sign Language Spotting with a Threshold Model Based on Conditional Random Fields", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 7, pp. 1264-1277, July 2009, doi:10.1109/TPAMI.2008.172
REFERENCES
[1] J. Alon, “Spatiotemporal Gesture Segmentation,” PhD thesis, Computer Science Dept., Boston Univ., 2006.
[2] J. Alon, V. Athitsos, and S. Sclaroff, “Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning,” Proc. IEEE Human Computer Interface Workshop, pp. 189-198, Oct. 2005.
[3] J. Alon, V. Athitsos, Y. Quan, and S. Sclaroff, “Simultaneous Localization and Recognition of Dynamic Hand Gestures,” Proc. IEEE Workshop Motion and Video Computing, pp. 254-260, Jan. 2005.
[4] R. Battison, Lexical Borrowing in American Sign Language. Linstok Press, 1978.
[5] B. Bauer and K.F. Kraiss, “Video-Based Sign Recognition Using Self-Organizing Subunits,” Proc. 16th Int'l Conf. Pattern Recognition, pp. 434-437, Aug. 2002.
[6] R. Bowden et al., “A Linguistic Feature Vector for the Visual Interpretation of Sign Language,” Proc. Eighth European Conf. Computer Vision, pp. 391-401, May 2004.
[7] A. Braffort, “ARGo: An Architecture for Sign Language Recognition and Interpretation,” Proc. Int'l Gesture Workshop Progress in Gestural Interaction, pp. 17-30, Apr. 1996.
[8] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector Machine, http://www.csie.ntu.edu.tw/cjlinlibsvmtools /, 2001.
[9] R.A. Cole et al., Survey of the State of the Art in Human Language Technology. Cambridge Univ. Press, 1997.
[10] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 886-893, June 2005.
[11] T.G. Dietterich, “Machine Learning for Sequential Data: A Review,” Proc. Joint IAPR Int'l Workshops Structural, Syntactic, and Statistical Pattern Recognition, pp. 15-30, Aug. 2002.
[12] A. Farhadi and D. Forsyth, “Aligning ASL for Statistical Translation Using a Discriminative Word Model,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1471-1476, June 2006.
[13] A. Forsyth, D. Farhadi, and R. White, “Transfer Learning in Sign Language,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, June 2007.
[14] W. Gao, G. Fang, D. Zhao, and Y. Chen, “Transition Movement Models for Large Vocabulary Continuous Sign Language Recognition,” Proc. Sixth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 553-558, May 2004.
[15] A. Gunawardana, M. Mahajan, A. Acero, and J.C. Platt, “Hidden Conditional Random Fields for Phone Classification,” Proc. European Conf. Speech Comm. and Technology, pp. 1117-1120, Sept. 2005.
[16] E.-J. Holden, G. Lee, and R. Owens, “Australian Sign Language Recognition,” Machine Vision and Applications, vol. 16, no. 5, pp.312-320, 2005.
[17] K. Imagawa, H. Matsuo, R. Taniguch, and D. Arita, “Recognition of Local Features for Camera-Based Sign Language Recognition System,” Proc. Fourth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 849-853, Mar. 2000.
[18] J. Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997.
[19] R. Kasturi and R. Jain, Computer Vision: Principles. IEEE CS Press, 1991.
[20] T. Kudo, CRF++: Yet Another CRF Toolkit, http://chasen.org/taku/softwareCRF++/, 2005.
[21] J. Lafferty, A. McCallum, and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int'l Conf. Machine Learning, pp. 282-289, June 2001.
[22] H.-K. Lee and J.-H. Kim, “An HMM-Based Threshold Model Approach for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 961-973, Oct. 1999.
[23] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. 17th Int'l Conf. Machine Learning, pp. 591-598, June 2000.
[24] L.-P. Morency, A. Quattoni, and T. Darrell, “Latent-Dynamic Discriminative Models for Continuous Gesture Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, , June 2007.
[25] S. Nayak, S. Sarkar, and B. Loeding, “Unsupervised Modeling of Signs Embedded in Continuous Sentences,” Proc. IEEE Workshop Vision for Human-Computer Interaction, pp. 81-88, June 2005.
[26] C.W. Ong and S. Ranganath, “Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 873-891, June 2005.
[27] Poser5 Reference Manual, Poser, Curious Labs, 2004.
[28] A. Quattoni, S. Wang, L.P. Morency, M. Collins, and T. Darrell, “Hidden Conditional Random Fields,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1848-1852, Oct. 2007.
[29] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, pp. 257-286, 1989.
[30] R.C. Rose, “Discriminant Word Spotting Techniques for Rejection Non-Vocabulary Utterances in Unconstrained Speech,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 105-108, Mar. 1992.
[31] C. Sminchisescu, A. Kanaujia, and D. Metaxas, “Conditional Models for Contextual Human Motion Recognition,” Computer Vision and Image Understanding, vol. 104, no. 2, pp. 210-220, 2006.
[32] T. Starner, J. Weaver, and A. Pentland, “Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, Dec. 1998.
[33] W.C. Stokoe, Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf, Studies in Linguistics: Occasional Papers 8. Linstok Press, 1960.
[34] M. Szummer, “Learning Diagram Parts with Hidden Random Fields,” Proc. Eighth Int'l Conf. Document Analysis and Recognition, pp. 1188-1193, Aug. 2005.
[35] C. Vogler and D. Metaxas, “A Framework for Recognizing the Simultaneous Aspects of American Sign Language,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 358-384, 2001.
[36] H.M. Wallach, “Conditional Random Fields: An Introduction,” Technical Report MS-CIS-04-21, Univ. of Pennsylvania, 2004.
[37] L.D. Wilcox and M.A. Bush, “Training and Search Algorithms for an Interactive Word Spotting System,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 97-100, Mar. 1992.
[38] D. Xi and S.-W. Lee, “Face Detection and Facial Component Extraction by Wavelet Decomposition and Support Vector Machines,” Proc. Fourth Int'l Conf. Audio- and Video-Based Biometric Person Authentication, pp. 199-207, June 2003.
[39] H.-D. Yang, A.-Y. Park, and S.-W. Lee, “Gesture Spotting and Recognition for Human-Robot Interaction,” IEEE Trans. Robotics, vol. 23, no. 2, pp. 256-270, 2007.
[40] H.-D. Yang, S.-W. Lee, and S.-W. Lee, “Multiple Human Detection and Tracking Based on Weighted Temporal Texture Features,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 20, no. 3, pp. 377-391, 2006.
[41] M. Yang, N. Ahuja, and M. Tabb, “Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1061-1074, Aug. 2002.
[42] R.D. Yang and S. Sarkar, “Detecting Coarticulation in Sign Language Using Conditional Random Fields,” Proc. 18th Int'l Conf. Pattern Recognition, pp. 108-112, Aug. 2006.
[43] R.D. Yang, S. Sarkar, and B. Loeding, “Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, Aug. 2007.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool