| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Multiobject Behavior Recognition by Event Driven Selective Attention Method
August 2000 (vol. 22 no. 8)
pp. 873-887
Abstract—Recognizing multiple object behaviors from nonsegmented image sequences is a difficult problem because most of the motion recognition methods proposed so far share the limitation of the single-object assumption. Based on existing methods, the problem can be solved only by bottom-up image sequence segmentation followed by sequence classification. This straightforward approach totally depends on bottom-up segmentation which is easily affected by occlusions and outliers. This paper presents a completely novel approach for this task without using bottom-up segmentation. Our approach is based on assumption generation and verification, i.e., feasible assumptions about the present behaviors consistent with the input image and behavior models are dynamically generated and verified by finding their supporting evidence in input images. This can be realized by an architecture called the selective attention model, which consists of a state-dependent event detector and an event sequence analyzer. The former detects image variation (event) in a limited image region (focusing region), which is not affected by occlusions and outliers. The latter analyzes sequences of detected events and activates all feasible states representing assumptions about multiobject behaviors. In this architecture, event detection can be regarded as a verification process of generated assumptions because each focusing region is determined by the corresponding assumption. This architecture is sound since all feasible assumptions are generated. However, these redundant assumptions imply ambiguity of the recognition result. Hence, we further extend the system by introducing 1) colored-token propagation to discriminate different objects in state space and 2) integration of multiviewpoint image sequences to disambiguate the single-view recognition results. Extensive experiments of human behavior recognition in real world environments demonstrate the soundness and robustness of our architecture.
[1] 873 A. Mackworth, “Consistency in Networks of Relations,” Artificial Intelligence, vol. 8, no. 1, pp. 99-118, 1977.[2] L.R. Rabiner and B.H. Juang, "An Introduction to Hidden Markov Models," IEEE Acoustics, Speech, and Signal Processing Magazine, vol. 3, pp. 4-16, Jan. 1986.[3] J. Yamato, H. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” Proc. 1992 IEEE Conf. Computer Vision and Pattern Recognition, pp. 379-385, 1992.[4] T. Starner and A. Pentland, "Real-Time American Sign Language Recognition From Video Using Hidden Markov Models," Proc. Int'l Symp. Computer Vision, Coral Gables, Fla., 1995.Los Alamitos, Calif.: IEEE CS Press. [5] C. Bregler and S.M. Omohundro, “Nonlinear Manifold Learning for Visual Speech Recognition,” Proc. Int'l Conf. Computer Vision, pp. 494-499, 1995.[6] A. Wilson and A. Bobick, “Learning Visual Behavior for Gesture Analysis,” Technical Report 337, MIT Media Laboratory Perceptual Computing Section, 1995[7] D. Wilson and A. Bobick, “Nonlinear PHMMs for the Interpretation of Parameterized Gesture,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.[8] Z. Ghahramani and M. Jordan, “Factorial Hidden Markov Models,” MIT Computational Cognitive Science Report 9,502, Aug. 1995.[9] L. Saul and M. Jordan, “Boltzmann Chains and Hidden Markov Models,” Advances in Neural Information Processing Systems 7, G. Tesauro, D.S. Touretzky, and T.K. Leen, eds., MIT Press, 1995.[10] M. Jordan, Z. Ghahramani, and L. Saul, “Hidden Markov Decision Trees,” MIT Computational Cognitive Science Technical Report 9,606, June 1996.[11] M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for Complex Action Recognition,” IEEE Proc. Computer Vision and Pattern Recognition, 1997.[12] J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 465-471, 1978.[13] H. Akaike, “Information Theory and an Extension of the Maximum Likelihood Principle,” Proc. Second Int'l Symp. Information Theory, B.N. Petrov and F. Csaki, eds., pp. 267-281, 1973.[14] M. Brand, “Understanding Manipulation in Video,” Proc. Int'l Conf. Automatic Face and Gesture Recognition, pp. 94-99, 1996.[15] A. Bobick and Y.A. Ivanov, “Action Recognition Using Probabilistic Parsing,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.[16] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Apr. 1979.[17] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance,” Proc. Seventh Int'l Conf. Computer Vision, pp. 255-261, 1999.[18] C. Stauffer and W.E.L. Grimson, Adaptive Background Mixture Models for Real-Time Tracking Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 246-252, 1999.[19] A. Waibel,T. Hanazawa,G. Hinton,K. Shikano,, and K. Lang,“Phoneme recognition using time-delay neural networks,” IEEE Trans. ASSP, vol. 37, no. 3, Mar. 1989.[20] M. Yang and N. Ahuja, “Extraction and Classification of Visual Motion Patterns for Hand Gesture Recognition,” Proc. Computer Vision and Pattern Recognition, pp. 892-897, 1998.[21] U. Meier, R. Stiefelhagen, J. Yang, and A. Waibel, “Towards Unrestricted Lipreading,” Proc. Int'l Conf. Multimodal Interfaces, 1999.[22] J. de Kleer, “An Assumption-Based tms,” Artificial Intelligence, pp. 127–162, 1986.[23] T. Wada and T. Matsuyama, “Appearance Sphere: Background Model for Pan-Tilt-Zoom Camera,” Proc. Int'l Conf. Pattern Recognition, vol. A, pp. 718-722, 1996.
Index Terms:
Behavior recognition, HMM, nondeterministic finite automata, selective attention mechanism, toke propagation, multiviewpoint image.
Citation:
Toshikazu Wada, Takashi Matsuyama, "Multiobject Behavior Recognition by Event Driven Selective Attention Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 873-887, Aug. 2000, doi:10.1109/34.868687