2015 IEEE International Conference on Multimedia and Expo (ICME) (2015)
June 29, 2015 to July 3, 2015
Jie Huang , University of Science and Technology of China, Hefei, China
Wengang Zhou , University of Science and Technology of China, Hefei, China
Houqiang Li , University of Science and Technology of China, Hefei, China
Weiping Li , University of Science and Technology of China, Hefei, China
Sign Language Recognition (SLR) targets on interpreting the sign language into text or speech, so as to facilitate the communication between deaf-mute people and ordinary people. This task has broad social impact, but is still very challenging due to the complexity and large variations in hand actions. Existing methods for SLR use hand-crafted features to describe sign language motion and build classification models based on those features. However, it is difficult to design reliable features to adapt to the large variations of hand gestures. To approach this problem, we propose a novel 3D convolutional neural network (CNN) which extracts discriminative spatial-temporal features from raw video stream automatically without any prior knowledge, avoiding designing features. To boost the performance, multi-channels of video streams, including color information, depth clue, and body joint positions, are used as input to the 3D CNN in order to integrate color, depth and trajectory information. We validate the proposed model on a real dataset collected with Microsoft Kinect and demonstrate its effectiveness over the traditional approaches based on hand-crafted features.
Three-dimensional displays, Feature extraction, Gesture recognition, Assistive technology, Hidden Markov models, Convolution, Trajectory
Jie Huang, Wengang Zhou, Houqiang Li and Weiping Li, "Sign Language Recognition using 3D convolutional neural networks," 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015, pp. 1-6.