This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review
July 1997 (vol. 19 no. 7)
pp. 677-695

Abstract—The use of hand gestures provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help in achieving the ease and naturalness desired for HCI. This has motivated a very active research area concerned with computer vision-based analysis and interpretation of hand gestures. We survey the literature on visual interpretation of hand gestures in the context of its role in HCI. This discussion is organized on the basis of the method used for modeling, analyzing, and recognizing gestures. Important differences in the gesture interpretation approaches arise depending on whether a 3D model of the human hand or an image appearance model of the human hand is used. 3D hand models offer a way of more elaborate modeling of hand gestures but lead to computational hurdles that have not been overcome given the real-time requirements of HCI. Appearance-based models lead to computationally efficient "purposive" approaches that work well under constrained situations but seem to lack the generality desirable for HCI. We also discuss implemented gestural systems as well as other potential applications of vision-based gesture recognition. Although the current progress is encouraging, further theoretical as well as computational advances are needed before gestures can be widely used for HCI. We discuss directions of future research in gesture recognition, including its integration with other natural modes of human-computer interaction.

[1] J.F. Abramatic, P. Letellier, and M. Nadler, "A Narrow-Band Video Communication System for the Transmission of Sign Language Over Ordinary Telephone Lines," Image Sequences Processing and Dynamic Scene Analysis, T.S. Huang, ed., pp. 314-336.Berlin and Heidelberg: Springer-Verlag, 1983.
[2] H.J. Sips, K. van Reeuwijk,, and W. Denissen, “Analysis of Local Enumeration and Storage Schemes in HPF,” Proc. Int'l Conf. Supercomputing, 1996.
[3] S. Ahmad and V. Tresp, "Classification With Missing and Uncertain Inputs," Proc. Int'l Conf. Neural Networks, vol. 3, pp. 1,949-1,954, 1993.
[4] S. Ahmad, "A Usable Real-Time 3D Hand Tracker," IEEE Asilomar Conf., 1994.
[5] J. Aloimonos, I. Weiss, and A. Bandyopadhyay, "Active Vision," Int'l J. Computer Vision, vol. 1, pp. 333-356, 1988.
[6] A. Azarbayejani, C. Wren, and A. Pentland, "Real-Time 3D Tracking of the Human Body," Proc. IMAGE'COM 96,Bordeaux, France, 1996.
[7] Y. Azoz, L. Devi, and R. Sharma, "Vision-Based Human Arm Tracking for Gesture Analysis Using Multimodal Constraint Fusion," Proc. 1997 Advanced Display Federated Laboratory Symp.,Adelphi, Md., Jan. 1997.
[8] R. Bajcsy, "Active Perception," Proc. IEEE, vol. 78, pp. 996-1,005, 1988.
[9] T. Baudel and M. Baudouin-Lafon, "Charade: Remote Control of Objects Using Free-Hand Gestures," Comm. ACM, vol. 36, no. 7, pp. 28-35, 1993.
[10] D.A. Becker and A. Pentland, "Using a Virtual Environment to Teach Cancer Patients T'ai Chi, Relaxation, and Self-Imagery," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., Oct. 1996.
[11] A. Blake and A. Yuille, Active Vision.Cambridge, Mass.: MIT Press, 1992.
[12] A.F. Bobick and J.W. Davis, "Real-Time Recognition of Activity Using Temporal Templates," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., Oct. 1996.
[13] H.A. Boulard and N. Morgan, Connectionnist Speech Recognition. A Hybrid Approach.Norwell, Mass.: Kluwer Academic Publishers, 1994.
[14] U. Bröckl-Fox, "Real-Time 3D Interaction With Up to 16 Degrees of Freedom From Monocular Image Flows," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 172-178, June 1995.
[15] L.W. Campbell, D.A. Becker, A. Azarbayejani, A.F. Bobick, and A. Pentland, "Invariant Features for 3D Gesture Recognition," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 157-162, Oct. 1996.
[16] C. Cedras and M. Shah, "Motion-Based Recognition: A Survey," Image and Vision Computing, vol. 11, pp. 129-155, 1995.
[17] K. Cho and S.M. Dunn, “Learning Shape Classes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 882-887, Sept. 1994.
[18] R. Cipolla and N.J. Hollinghurst, "Human-Robot Interface by Pointing With Uncalibrated Stereo Vision," Image and Vision Computing, vol. 14, pp. 171-178, Mar. 1996.
[19] R. Chipolla, Y. Okamoto, and Y. Kuno, "Robust Structure from Motion Using Motion Parallax," Int'l Conf. Computer Vision, pp. 374-382,Berlin, May 1993.
[20] E. Clergue, M. Goldberg, N. Madrane, and B. Merialdo, "Automatic Face and Gestural Recognition for Video Indexing," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 110-115, June 1995.
[21] T. F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active Shape Models—Their Training and Application," Computer Vision and Image Understanding, vol. 61, pp. 38-59, Jan. 1995.
[22] J.L. Crowley, F. Berard, and J. Coutaz, "Finger Tacking As an Input Device for Augmented Reality," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 195-200, June 1995.
[23] Y. Cui and J. Weng, "Learning-Based Hand Sign Recognition," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 201-206, June 1995.
[24] Y. Cui and J. J. Weng, "Hand Segmentation Using Learning-Based Prediction and Verification for Hand Sign Recognition," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 88-93, Oct. 1996.
[25] T. Darrell, I. Essa, and A. Pentland, “Task-Specific Gesture Analysis in Real-Time Using Interpolated Views,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 12, pp. 1,236-1,242, Dec. 1996.
[26] T. Darrell and A.P. Pentland, "Attention-Driven Expression and Gesture Analysis in an Interactive Environment," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 135-140, June 1995.
[27] J. Davis and M. Shah, "Determining 3D Hand Motion," Proc. 28th Asilomar Conf. Signals, Systems, and Computer, 1994.
[28] J. Davis and M. Shah, "Recognizing Hand Gestures," Proc. European Conf. Computer Vision,Stockholm, Sweden, pp. 331-340, 1994.
[29] A. C. Downton and H. Drouet, "Image Analysis for Model-Based Sign Language Coding," Progress in Image Analysis and Processing II: Proc. Sixth Int'l Conf. Image Analysis and Processing, pp. 637-644, 1991.
[30] I. Essa and A. Pentland, "Facial Expression Recognition Using a Dynamic Model and Motion Energy," Proc. Int'l Conf. Computer Vision, pp. 360-367,Cambridge, Mass., 1995.
[31] M. Etoh, A. Tomono, and F. Kishino, "Stereo-Based Description by Generalized Cylinder Complexes From Occluding Contours," Systems and Computers in Japan, vol. 22, no. 12, pp. 79-89, 1991.
[32] S.S. Fels and G.E. Hinton, “Glove-Talk: A Neural Network Interface between a Data-Glove and a Speech Synthesizer,” IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 2-8, Jan. 1993.
[33] W.T. Freeman, K. Tanaka, J. Ohta, and K. Kyuma, "Computer Vision for Computer Games," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 100-105, Oct. 1996.
[34] W.T. Freeman and M. Roth, "Orientation Histograms for Hand Gesture Recognition," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, June 1995.
[35] W.T. Freeman and C.D. Weissman, "Television Control by Hand Gestures," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 179-183, June 1995.
[36] M. Fukumoto, Y. Suenaga, and K. Mase, "Finger-Pointer": Pointing Interface by Image Processing," Computers and Graphics, vol. 18, no. 5, pp. 633-642, 1994.
[37] D.M. Gavrila and L.S. Davis, "Towards 3D Model-Based Tracking and Recognition of Human Movement: A Multi-View Approach," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 272-277, June 1995.
[38] H.P. Graf, E. Cosatto, D. Gibbon, M. Kocheisen, and E. Petajan, "Multi-Modal System for Locating Heads and Faces," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 88-93, Oct. 1996.
[39] G. D. Hager, Task Directed Sensor Fusion and Planning. Kluwer Academic Publishers, 1990.
[40] H. Harashima and F. Kishino, "Intelligent Image Coding and Communications With Realistic Sensations—Recent Trends," IEICE Trans., vol. E74, pp. 1,582-1,592, June 1991.
[41] A.G. Hauptmann and P. McAvinney, "Gesture With Speech for Graphics Manipulation," Int'l J. Man-Machine Studies, vol. 38, pp. 231-249, Feb. 1993.
[42] T. Heap and D. Hogg, "Towards 3D Hand Tracking Using a Deformable Model," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 140-145, Oct. 1996.
[43] E. Hunter, J. Schlenzig, and R. Jain, "Posture Estimation in Reduced-Model Gesture Input Systems," Proc. Int'l Workshop on Automatic Face and Gesture Recognition, June 1995.
[44] K. Ishibuchi, H. Takemura, and F. Kishino, "Real Time Hand Gesture Recognition Using 3D Prediction Model," Proc. 1993 Int'l Conf. Systems, Man, and Cybernetics,Le Touquet, France, pp. 324-328, Oct.17-20, 1993.
[45] S.X. Ju, M.J. Black, and Y. Yacoob, “Cardboard People: A Parameterized Model of Articulated Image Motion,” Proc. Second Int'l Conf. Automatic Face- and Gesture-Recognition, pp. 38-44, Oct. 1996.
[46] I.A. Kakadiaris, D. Metaxas, and R. Bajcsy, "Active Part-Decomposition, Shape and Motion Estimations of Articulated Objects: A Physics-Based Approach," Computer Vision and Pattern Recognition, CVPR-94, pp. 980-984,Seattle, 1994.
[47] S.B. Kang and K. Ikeuchi, "Toward Automatic Robot Instruction for Perception—Recognizing a Grasp From Observation," IEEE Trans. Robotics and Automation, vol. 9, pp. 432-443, Aug. 1993.
[48] A. Kendon, "Current Issues in the Study of Gesture," The Biological Foundations of Gestures: Motor and Semiotic Aspects, J.-L. Nespoulous, P. Peron, and A. R. Lecours, eds., pp. 23-47. Lawrence Erlbaum Assoc., 1986.
[49] C. Kervrann and F. Heitz, "Learning Structure and Deformation Modes of Nonrigid Objects in Long Image Sequences," Proc. Int'l Workshop on Automatic Face and Gesture Recognition, June 1995.
[50] R. Kjeldsen and J. Kender, "Visual Hand Gesture Recognition for Window System Control," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 184-188, June 1995.
[51] R. Kjeldsen and J. Kender, "Finding Skin in Color Images," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 312-317, Oct. 1996.
[52] R. Koch, "Dynamic 3D Scene Analysis through Synthesis Feedback Control," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 6, June 1993, pp. 556-568.
[53] M.W. Krueger, Artificial Reality II. Addison-Wesley, 1991.
[54] M.W. Krueger, "Environmental Technology: Making the Real World Virtual," Comm. ACM, vol. 36, pp. 36-37, July 1993.
[55] J.J. Kuch, "Vision-Based Hand Modeling and Gesture Recognition for Human Computer Interaction," master's thesis, Univ. of Illinois at Urbana-Champaign, 1994.
[56] J.J. Kuch and T.S. Huang, "Vision-Based Hand Modeling and Tracking," Proc. IEEE Int'l Conf. Computer Vision,Cambridge, Mass., June 1995.
[57] Y. Kuno, M. Sakamoto, K. Sakata, and Y. Shirai, "Vision-Based Human Computer Interface With User Centered Frame," Proc. IROS'94, 1994.
[58] A. Lanitis, C.J. Taylor, T.F. Cootes, and T. Ahmed, "Automatic Interpretation of Human Faces and Hand Gestures Using Flexible Models," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 98-103, June 1995.
[59] J. Lee and T.L. Kunii, "Constraint-Based Hand Animation," Models and Techniques in Computer Animation, pp. 110-127.Tokyo: Springer-Verlag, 1993.
[60] J. Lee and T.L. Kunii, "Model-Based Analysis of Hand Posture," IEEE Computer Graphics and Applications, pp. 77-86, Sept. 1995.
[61] E.T. Levy and D. McNeill, "Speech, Gesture, and Discourse," Discourse Processes, no. 15, pp. 277-301, 1992.
[62] C. Maggioni, “A Novel Gestural Input Device for Virtual Reality,” Proc. IEEE Virtual Reality Ann. Int'l Symp., pp. 118-124, Seattle, Wash., 1993.
[63] C. Maggioni, "GestureComputer—New Ways of Operating a Computer," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 166-171, June 1995.
[64] N. Magnenat-Thalmann and D. Thalman, Computer Animation: Theory and Practice.New York: Springer-Verlag, 2nd rev. ed., 1990.
[65] D. McNeill and E. Levy, "Conceptual Representations in Language Activity and Gesture," Speech, Place and Action: Studies in Deixis and Related Topics, J. Jarvella and W. Klein, eds. Wiley, 1982.
[66] A. Meyering and H. Ritter, "Learning to Recognize 3D-Hand Postures From Perspective Pixel Images," Artificial Neural Networks 2, I. Alexander and J. Taylor, eds. North-Holland: Elsevier Science Publishers B.V., 1992.
[67] B. Moghaddam and A. Pentland, "Maximum Likelihood Detection of Faces and Hands," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 122-128, June 1995.
[68] O'Rourke and N.L. Badler, "Model-Based Image Analysis of Human Motion Using Constraint Propagation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, pp. 522-536, 1980.
[69] V. Pavlovic, R. Sharma, and T. Huang, “Gestural Interface to a Visual Computing Enviroment for Molecular Biologists,” Proc. IEEE Int'l Conf. Face and Gesture Recognition, pp. 30-35, Killington, Vt., Oct. 1996.
[70] D.L. Quam, "Gesture Recognition With a DataGlove," Proc. 1990 IEEE National Aerospace and Electronics Conf., vol. 2, 1990.
[71] F.K.H. Quek, "Toward a Vision-Based Hand Gesture Interface," Virtual Reality Software and Technology Conf., pp. 17-31, Aug. 1994.
[72] F.K.H. Quek, "Eyes in the Interface," Image and Vision Computing, vol. 13, Aug. 1995.
[73] F.K.H. Quek, T. Mysliwiec, and M. Zhao, "Finger Mouse: A Freehand Pointing Interface," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 372-377, June 1995.
[74] L.R. Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
[75] L.R. Rabiner and B. Juang, Fundamentals of Speech Recognition.Englewood Cliffs, N.J.: Prentice Hall, 1993.
[76] J.M. Rehg and T. Kanade, "DigitEyes: Vision-Based Human Hand Tracking," Technical Report CMU-CS-93-220, School of Computer Science, Carnegie Mellon Univ., 1993.
[77] J.M. Rehg and T. Kanade, “Model-Based Tracking of Self-Occluding Articulated Objects,” Proc. Fifth Int'l Conf. Computer Vision, pp. 612–617, June 1995.
[78] H. Rheingold, Virtual Reality. Summit Books, 1991.
[79] J. Schlenzig, E. Hunter, and R. Jain, “Vision Based Hand Gesture Interpretation Using Recursive Estimation,” Proc. 28th Asilomar Conf. Signals, Systems, and Computers, 1994.
[80] J. Schlenzig, E. Hunter, and R. Jain, "Recursive Identification of Gesture Inputs Using Hidden Markov Models," Proc. Second IEEE Workshop on Applications of Computer Vision,Sarasota, Fla., pp. 187-194, Dec.5-7, 1994.
[81] J. Segen, "Controlling Computers With Gloveless Gestures," Proc. Virtual Reality Systems, Apr. 1993.
[82] R. Sharma, "Active Vision for Visual Servoing: A Review," IEEE Workshop on Visual Servoing: Achievements, Applications and Open Problems, May 1994.
[83] R. Sharma, T.S. Huang, and V.I. Pavlovic, "A Multimodal Framework for Interacting With Virtual Environments," Human Interaction With Complex Systems, C.A. Ntuen and E.H. Park, eds., pp. 53-71. Kluwer Academic Publishers, 1996.
[84] R. Sharma, T.S. Huang, V.I. Pavlovic, Y. Zhao, Z. Lo, S. Chu, K. Schulten, A. Dalke, J. Phillips, M. Zeller, and W. Humphrey, "Speech/Gesture Interface to a Visual Computing Environment for Molecular Biologists," Proc. Int'l Conf. Pattern Recognition, 1996.
[85] K. Shirai and S. Furui, "Special Issue on Spoken Dialogue," Speech Communication, vol. 15, pp. 3-4, 1994.
[86] T.E. Starner and A. Pentland, "Visual Recognition of American Sign Language Using Hidden Markov Models," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 189-194, June 1995.
[87] J. Streeck, "Gesture as Communication I: Its Coordination With Gaze and Speech," Communication Monographs, vol. 60, pp. 275-299, Dec. 1993.
[88] D. Sturman and D. Zeltzer, “A Survey of Glove-Based Input,” IEEE CG&A, Vol. 14, No. 1, Jan. 1994, pp. 30-39.
[89] D.X. Sun and L. Deng, "Nonstationary Hidden Markov Models for Speech Recognition," Image Models (and Their Speech Model Cousins), S.E. Levinson and L. Shepp, eds., pp. 161-182.New York: Springer-Verlag, 1996.
[90] D.L. Swets and J. Weng, Using Discriminant Eigenfeatures for Image Retrieval IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 831-836, Aug. 1996.
[91] K. Tarabanis, P. Allen, and R. Tsai, A Survey of Sensor Planning in Computer Vision IEEE Trans. Robotics and Automation, vol. 11, pp. 86-104, 1995.
[92] D. Thompson, "Biomechanics of the Hand," Perspectives in Computing, vol. 1, pp. 12-19, Oct. 1981.
[93] Y.A. Tijerino, K. Mochizuki, and F. Kishino, "Interactive 3D Computer Graphics Driven Through Verbal Instructions: Previous and Current Activities at ATR," Computers and Graphics, vol. 18, no. 5, pp. 621-631, 1994.
[94] A. Torige and T. Kono, "Human-Interface by Recognition of Human Gestures With Image Processing. Recognition of Gesture to Specify Moving Directions," IEEE Int'l Workshop on Robot and Human Communication, pp. 105-110, 1992.
[95] R. Tubiana ed., The Hand, vol. 1. Philadelphia, Penn.: Sanders, 1981.
[96] C. Uras and A. Verri, "Hand Gesture Recognition From Edge Maps," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 116-121, June 1995.
[97] R. Vaillant and D. Darmon, "Vision-Based Hand Pose Estimation," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 356-361, June 1995.
[98] M.T. Vo, R. Houghton, J. Yang, U. Bub, U. Meier, A. Waibel, and P. Duchnowski, "Multimodal Learning Interfaces," ARPA Spoken Language Technology Workshop 1995, Jan. 1995.
[99] M.T. Vo and A. Waibel, "A Multi-Modal Human-Computer Interface: Combination of Gesture and Speech Recognition," Adjunct Proc. InterCHI'93, Apr.26-29 1993.
[100] A. Waibel and K.F. Lee, Readings in Speech Recognition. Morgan Kaufmann, 1990.
[101] C. Wang and D.J. Cannon, "A Virtual End-Effector Pointing System in Point-and-Direct Robotics for Inspection of Surface Flaws Using a Neural Network-Based Skeleton Transform," Proc. IEEE Int'l Conf. Robotics and Automation, vol. 3, pp. 784-789, May 1993.
[102] K. Watanuki, K. Sakamoto, and F. Togawa, "Multimodal Interaction in Human Communication," IEICE Trans. Information and Systems, vol. E78-D, pp. 609-614, June 1995.
[103] A.D. Wilson and A.F. Bobick, "Configuration States for the Representation and Recognition of Gesture," Proc. Int'l Workshop on Automatic Face and Gesture Recognition,Zurich, Switzerland, pp. 129-134, June 1995.
[104] A.D. Wilson and A.F. Bobick, "Recovering the Temporal Structure of Natural Gestures," Proc. Int'l Conf. Automatic Face and Gesture Recognition,Killington, Vt., pp. 66-71, Oct. 1996.
[105] C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” Proc. Second Int'l Conf. Automatic Face and Gesture Recognition, pp. 51–56, Apr. 1998.

Index Terms:
Vision-based gesture recognition, gesture analysis, hand tracking, nonrigid motion analysis, human-computer interaction.
Citation:
Vladimir I. Pavlovic, Rajeev Sharma, Thomas S. Huang, "Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 677-695, July 1997, doi:10.1109/34.598226
Usage of this product signifies your acceptance of the Terms of Use.