The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2009 vol.31)
pp: 1775-1789
Abhinav Gupta , University of Maryland, College Park
Aniruddha Kembhavi , University of Maryland, College Park
Larry S. Davis , University of Maryland, College Park
ABSTRACT
Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human-object interactions. Previous approaches to object and action recognition rely on static shape/appearance feature matching and motion analysis, respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information.
INDEX TERMS
Action recognition, object recognition, functional recognition.
CITATION
Abhinav Gupta, Aniruddha Kembhavi, Larry S. Davis, "Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 10, pp. 1775-1789, October 2009, doi:10.1109/TPAMI.2009.83
REFERENCES
[1] A. Agarwal and B. Triggs, “3d Human Pose from Silhouettes by Relevance Vector Regression,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[2] P. Bach, G. Knoblich, T. Gunter, A. Friederici, and W. Prinz, “Action Comprehension: Deriving Spatial and Functional Relations,” J. Experimental Psychology Human Perception and Performance, vol. 31, no. 3, pp. 465-479, 2005.
[3] A. Berg and J. Malik, “Geometric Blur for Template Matching,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[4] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, “Interactive Image Segmentation Using an Adaptive GMMRF Model,” Proc. European Conf. Computer Vision, 2004.
[5] A. Bobick and A. Wilson, “A State-Based Approach to the Representation and Recognition of Gesture,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.
[6] A. Bosch, A. Zisserman, and X. Muñoz, “Image Classification Using Random Forests and Ferns,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[7] D. Bub and M. Masson, “Gestural Knowledge Evoked by Objects As Part of Conceptual Representations,” Aphasiology, vol. 20, pp.1112-1124, 2006.
[8] L.L. Chao and A. Martin, “Representation of Manipulable Man-Made Objects in Dorsal Stream,” NeuroImage, vol. 12, pp. 478-484, 2000.
[9] N. Dalal and B. Triggs, “Histogram of Oriented Gradients for Fast Human Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[10] J. Davis, H. Gao, and V. Kannappan, “A Three-Mode Expressive Feature Model of Action Effort,” Proc. IEEE Workshop Motion and Video Computing, 2002.
[11] Z. Duric, J. Fayman, and E. Rivlin, “Function from Motion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp.579-591, June 1996.
[12] P. Felzenszwalb and D. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 61, pp. 55-79, 2005.
[13] V. Ferrari, M. Marin, and A. Zisserman, “Progressive Search Space Reduction for Human Pose Estimation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[14] R. Filipovych and E. Ribeiro, “Recognizing Primitive Interactions by Exploring Actor-Object States,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[15] V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti, “Action Recognition in Premotor Cortex,” Brain, vol. 2, pp. 593-609, 1996.
[16] G. Guerra and Y. Aloimonos, “Discovering a Language for Human Activity,” Proc. Assoc. Advancement of Artificial Intelligence Workshop Anticipation in Cognitive Systems, 2005.
[17] A. Gupta, T. Chen, F. Chen, D. Kimber, and L. Davis, “Context and Observation Driven Latent Variable Model for Human Pose Estimation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[18] A. Gupta and L. Davis, “Objects in Action: An Approach for Combining Action Understanding and Object Perception,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[19] A. Gupta and L. Davis, “Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers,” Proc. European Conf. Computer Vision, 2008.
[20] A. Gupta, A. Mittal, and L. Davis, “Constraint Integration for Efficient Multiview Pose Estimation with Self-Occlusions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp.493-506, Mar. 2008.
[21] A. Gupta, J. Shi, and L. Davis, “A 'Shape Aware' Model for Semi-Supervised Learning of Objects and Its Context,” Proc. Conf. Neural Information Processing Systems, 2008.
[22] A. Gupta, P. Srinivasan, J. Shi, and L. Davis, “Understanding Videos, Constructing Plots—Learning a Visually Grounded Storyline Model from Annotated Videos,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[23] H.B. Helbig, M. Graf, and M. Kiefer, “The Role of Action Representation in Visual Object,” Experimental Brain Research, vol. 174, pp. 221-228, 2006.
[24] D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[25] S.H. Johnson-Frey, F.R. Maloof, R. Newman-Norlund, C. Farrer, S. Inati, and S.T. Grafton, “Actions or Hand-Object Interactions? Human Inferior Frontal Cortex and Action Observation,” Neuron, vol. 39, pp. 1053-1058, 2003.
[26] Z. Kourtzi, “But Still it Moves,” Trends in Cognitive Science, vol. 8, pp. 47-49, 2004.
[27] Z. Kourtzi and N. Kanwisher, “Activation in Human MT/MST by Static Images with Implied Motion,” J. Cognitive Neuroscience, vol. 12, pp. 48-55, 2000.
[28] Y. Kuniyoshi and M. Shimozaki, “A Self-Organizing Neural Model for Context Based Action Recognition,” Proc. IEEE Eng. Medicine and Biology Soc. Conf. Neural Eng., 2003.
[29] L.-J. Li and L. Fei-Fei, “What, Where and Who? Classifying Events by Scene and Object Recognition,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[30] R. Mann, A. Jepson, and J. Siskind, “The Computational Perception of Scene Dynamics,” Computational Vision and Image Understanding, vol. 65, no. 2, pp. 113-128, 1997.
[31] R. Marteniuk, C. MacKenzie, M. Jeannerod, S. Athenes, and C. Dugas, “Constraints on Human Arm Movement Trajectories,” Canadian J. Psychology, vol. 41, pp. 365-378, 1987.
[32] T.B. Moeslund, A. Hilton, and V. Kruger, “A Survey of Advances in Vision-Based Human Motion Capture and Analysis,” Computer Vision and Image Understanding, vol. 2, pp. 90-126, 2006.
[33] D. Moore, I. Essa, and M. Hayes, “Exploiting Human Action and Object Context for Recognition Tasks,” Proc. IEEE Int'l Conf. Computer Vision, 1999.
[34] H. Murase and S. Nayar, “Learning Object Models from Appearance,” Proc. Nat'l Conf. Artificial Intelligence, 1993.
[35] K. Murphy, A. Torralba, and W. Freeman, “Graphical Model for Scenes and Objects,” Proc. Conf. Neural Information Processing Systems, 2003.
[36] K. Murphy, A. Torralba, and W. Freeman, “Using the Forest to See the Trees: A Graphical Model Relating Features, Objects and Scenes,” Proc. Conf. Neural Information Processing Systems, 2004.
[37] H. Nagel, “From Image Sequences towards Conceptual Descriptions,” Image and Vision Computing, vol. 6, no. 2, pp. 59-74, 1988.
[38] K. Nelissen, G. Luppino, W. Vanduffel, G. Rizzolatti, and G. Orban, “Observing Others: Multiple Action Representation in Frontal Lobe,” Science, vol. 310, pp. 332-336, 2005.
[39] A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope,” Int'l J. Computer Vision, vol. 42, pp. 145-175, 2001.
[40] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Network and Plausible Inference. Morgan Kaufmann, 1988.
[41] P. Peursum, G. West, and S. Venkatesh, “Combining Image Regions and Human Activity for Indirect Object Recognition in Indoor Wide Angle Views,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[42] V. Prasad, V. Kellokompu, and L. Davis, “Ballistic Hand Movements,” Proc. Conf. Articulated Motion and Deformable Objects, 2006.
[43] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, “Objects in Context,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[44] C. Rao, A. Yilmaz, and M. Shah, “View-Invariant Representation and Recognition of Actions,” Int'l J. Computer Vision, vol. 2, pp.203-226, 2002.
[45] E. Rivlin, S. Dickinson, and A. Rosenfeld, “Recognition by Functional Parts,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1994.
[46] M. Shah and R. Jain, Motion-Based Recognition. Kluwer Academic, 1997.
[47] I. Smyth and M. Wing, The Psychology of Human Movement. Academic Press, 1984.
[48] L. Stark and K. Bowyer, “Generic Recognition through Qualitative Reasoning about 3D Shape and Object Function,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1991.
[49] E. Sudderth, A. Torralba, W. Freeman, and A. Wilsky, “Learning Hierarchical Models of Scenes, Objects and Parts,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[50] J. Sullivan and S. Carlsson, “Recognizing and Tracking Human Action,” Proc. European Conf. Computer Vision, 2002.
[51] S. Todorovic and N. Ahuja, “Learning Subcategory Relevances for Category Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[52] A. Torralba and P. Sinha, “Statistical Context Priming for Object Detection,” Proc. IEEE Int'l Conf. Computer Vision, 2001.
[53] C. Urgesi, V. Moro, M. Candidi, and S. Aglioti, “Mapping Implied Body Actions in the Human Motor System,” J. Neuroscience, vol. 26, pp. 7942-7949, 2006.
[54] L. Vaina and M. Jaulent, “Object Structure and Action Requirements: A Compatibility Model for Functional Recognition,” Int'l J. Intelligent Systems, vol. 6, pp. 313-336, 1991.
[55] A. Vezhnevets and V. Vezhnevets, “‘Modest Adaboost’—Teaching Adaboost to Generalize Better,” Proc. Graphicon, 2005.
[56] Y. Wang, H. Jiang, M. Drew, Z. Li, and G. Mori, “Unsupervised Discovery of Action Classes,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[57] A. Wilson and A. Bobick, “Parametric Hidden Markov Models for Gesture Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sept. 1999.
[58] B. Wu and R. Nevatia, “Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors,” Proc. IEEE Int'l Conf. Computer Vision, 2005.
[59] B. Wu and R. Nevatia, “Detection and Tracking of Multiple Humans with Extensive Pose Articulation,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[60] J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. Rehg, “A Scalable Approach to Activity Recognition Based on Object Use,” Proc. IEEE Int'l Conf. Computer Vision, 2007.
[61] A. Yilmaz and M. Shah, “Actions Sketch: A Novel Action Representation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[62] Q. Zhu, S. Avidan, M. Ye, and K. Cheng, “Fast Human Detection Using a Cascade of Histograms of Oriented Gradients,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool