This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Event Detection and Analysis from Video Streams
August 2001 (vol. 23 no. 8)
pp. 873-889

Abstract—We present a system which takes as input a video stream obtained from an airborne moving platform and produces an analysis of the behavior of the moving objects in the scene. To achieve this functionality, our system relies on two modular blocks. The first one detects and tracks moving regions in the sequence. It uses a set of features at multiple scales to stabilize the image sequence, that is, to compensate for the motion of the observe, then extracts regions with residual motion and uses an attribute graph representation to infer their trajectories. The second module takes as input these trajectories, together with user-provided information in the form of geospatial context and goal context to instantiate likely scenarios. We present details of the system, together with results on a number of real video sequences and also provide a quantitative analysis of the results.

[1] A. Bobick and Y.A. Ivanov, “Action Recognition Using Probabilistic Parsing,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.
[2] A.F. Bobick, A.P. Pentland, and T. Poggio, “VSAM at the MIT Media Laboratory and CBCL: Learning and Understanding Action in Video Imagery PI Report 1998,” Proc. DARPA Image Understanding Workshop, pp. 85-91, 1998.
[3] M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for Complex Action Recognition,” IEEE Proc. Computer Vision and Pattern Recognition, 1997.
[4] F. Brémond and G. Medioni, “Scenario Recognition in Airborne Video Imagery,” Proc. DARPA Image Understanding Workshop, 1998.
[5] F. Brémond and M. Thonnat, “Issues of Representing Context Illustrated by Video-Surveillance Applications,” Int'l J. Human-Computer Studies, special issue on context, 1998.
[6] H. Buxton and S. Gong, “Visual Surveillance in a Dynamic and Uncertain World,” Artificial Intelligence, vol. 78, nos. 1-2, pp. 431-459, 1995.
[7] I. Cohen and I. Herlin, “Non Uniform Multiresolution Method for Optical Flow and Phase Portrait Models: Environmental Applications,” Int'l J. Computer Vision, vol. 33, no. 1, pp. 1-22, 1999.
[8] I. Cohen and G. Medioni, “Detecting and Tracking Moving Objects in Video from an Airborne Observer,” Proc. DARPA Image Understanding Workshop, 1998.
[9] I. Cohen and G. Medioni, “Detecting and Tracking Moving Objects for Video Surveillance,” IEEE Proc. Computer Vision and Pattern Recognition, June 1999.
[10] D. Corrall, “Deliverable 3: Visual Monitoring and Surveillance of Wide-Area Outdoor Scenes,” Technical Report Esprit Project 2152: VIEWS, June 1992.
[11] I.J. Cox and S.L. Hingorani, "An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 2 , Feb. 1996, pp. 138-150.
[12] I.J. Cox and M.L. Miller, “On Finding Ranked Assignemnts with Application to Multi-Target Tracking and Motion Correspondence,” AeroSys, vol. 32, no. 1, pp. 486-489, Jan. 1995.
[13] J.W. Davis and A.F. Bobick, “The Representation and Recognition of Human Movement Using Temporal Templates,” IEEE Proc. Computer Vision and Pattern Recognition, pp. 928-934, June 1997.
[14] L. Davis, R. Chelappa, A. Rosenfeld, D. Harwood, I. Haritaoglu, and R. Cutler, “Visual Surveillance and Monitoring,” Proc. DARPA Image Understanding Workshop, pp. 73-76, 1998.
[15] O.D. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint.Cambridge, Mass.: MIT Press, 1993.
[16] M.A. Fischler and R.C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Graphics and Image Processing, vol. 24, no. 6, pp. 381–395, June 1981.
[17] B. Flinchbaugh, “Reliable Video Event Recognition for Network Cameras,” Proc. DARPA Image Understanding Workshop, pp. 81-83, 1998.
[18] A. Galton, “Towards an Integrated Logic of Space, Time and Motion,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI), Aug. 1993.
[19] W.E.L. Grimson, L. Lee, R. Romano, and C. Stauffer, “Using Adaptive Tracking to Classify and Monitor Activities in a Site,“ IEEE Proc. Computer Vision and Pattern Recognition, pp. 22-31, 1998.
[20] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4S: A Real-Time System for Detecting and Tracking People in 2 1/2-D,” Proc. European Conf. Computer Vision, 1998.
[21] C. Harris and M.J. Stephens, “A Combined Corner and Edge Detector,” Alvey88, pp. 147-152, 1988.
[22] G. Herzog, “From Visual Input to Verbal Output in the Visual Translator,” Projet VITRA 124, Universität des Saarlandes, Saarbrücken, Germany, 1995.
[23] S. Hongeng, F. Bremond, and R. Nevatia, “Representation and Optimal Recognition of Human Activities,” IEEE Proc. Computer Vision and Pattern Recognition, 2000.
[24] R.J. Howarth and H. Buxton, “Visual Surveillance Monitoring and Watching,” Proc. European Conf. Computer Vision, vol. II, pp. 321-334, 1996.
[25] D.P. Huttenlocher, J.J. Noh, and W.J. Rucklidge, "Tracking Non-Rigid Objects in Complex Scenes," Proc. IEEE Int'l Conf. Computer Vision, 1993.
[26] S. Intille and A. Bobick, “Visual Recognition of Multi-Agent Action Using Binary Temporal Relations,” IEEE Proc. Computer Vision and Pattern Recognition, June 1999.
[27] S.S. Intille, J.W. Davis, and A.F. Bobick, “Real Time Closed World Tracking,” IEEE Proc. Computer Vision and Pattern Recognition, pp. 697-703, 1997.
[28] M. Irani and P. Anandan, “Robust Multi-Sensor Image Alignment,” Proc. DARPA Image Understanding Workshop, vol. 1, pp. 639-647, May 1997.
[29] M. Irani and P. Anandan, “A Unified Approach to Moving Object Detection in 2D and 3D Scenes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 6, pp. 577-589, June 1998.
[30] M. Irani, P. Anandan, and S. Hsu, “Mosaic Based Representations of Video Sequences and Their Applications,” Proc. Fifth Int'l Conf. Computer Vision, pp. 605-611, June 1995.
[31] M. Irani, B. Rousso, and S. Peleg, “Detecting and Tracking Multiple Moving Objects Using Temporal Integration,” Proc. European Conf. Computer Vision, pp. 282-287, May 1992.
[32] T. Kanade, R.T. Collins, A.J. Lipton, P. Burt, and L. Wixon, “Advances in Cooperative Multi-Sensor Video Surveillance,” Proc. DARPA Image Understanding Workshop, pp. 3-24, 1998.
[33] A.J. Lipton, H. Fujiyoshi, and R.S. Patil, “Moving Target Classification and Tracking from Real Time Video,” Proc. Fourth IEEE Workshop Applications of Computer Vision '98, pp. 8-14, 1998.
[34] C. Morimoto and R. Chellappa, “Fast 3D Stabilization and Mosaic Construction,” IEEE Proc. Computer Vision and Pattern Recognition, pp. 660-665, June 1997.
[35] J.R. Muller, P. Anandan, and J.R. Bergen, “Adaptive-Complexity Registration of Images,” IEEE Proc. Computer Vision and Pattern Recognition, pp. 953-957, 1994.
[36] H.H. Nagel, “From Image Sequences Towards Conceptual Descriptions,” Image and Vision Computing, vol. 6, no. 2, pp. 59-74, May 1988.
[37] B. Neumann, Semantic Structures: Advances in Natural Language Processing, D.L. Waltz, ed. chapter 5, pp. 167-206, Hillsdale, N.J.: Lawrence Erlbaum, 1989.
[38] S. Peleg and H. Rom, “Motion Based Segmentation,” IEEE Proc. Int'l Conf. Pattern Recognition, vol. 1, pp. 109-113, 1990.
[39] C. Pinhanez and A. Bobick, “Human Action Detection Using PNF Propagation of Temporal Constraints,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.
[40] D.B. Reid, “An Algorithm for Tracking Multiple Targets,” IEEE Trans. Automatic Control, vol. 24, no. 6, pp. 843-854, Dec. 1979.
[41] C. Schmid, R. Mohr, and C. Bauckhage, “Comparing and Evaluating Interest Points,” IEEE Proc. Int'l Conf. Computer Vision, pp. 230-235, 1998.
[42] T. Starner and A. Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models,” Proc. Int'l Workshop Automatic Face- and Gesture-Recognition, 1995.
[43] T. Strat, “Employing Contextual Information in Computer Vision,” Proc. DARPA Image Understanding Workshop, pp. 217-229, 1993.
[44] R. Szeliski, “Image Mosaicing for Tele-Reality Applications,” IEEE Computer Graphics and Applications, 1996.
[45] R. Szeliski and H.-Y. Shum, “Creating Full View Panoramic Image Mosaics and Environments Maps,” Proc. Computer Graphics, Ann. Conf. Series, vol. 8, pp. 251-258, 1997.
[46] D. Wilson and A. Bobick, “Nonlinear PHMMs for the Interpretation of Parameterized Gesture,” IEEE Proc. Computer Vision and Pattern Recognition, June 1998.
[47] R. Zabih and J. Woodfill, “Non-Parametric Local Transforms for Computing Visual Correspondence,” Proc. European Conf. Computer Vision, May 1994.
[48] I. Zoghlami, O. Faugeras, and R. Deriche, “Using Geometric Corners to Build a 2D Mosaic from a Set of Images,” IEEE Conf. Computer Vision and Pattern Recognition, pp. 420-425, June 1997.

Index Terms:
Detection and tracking of moving objects, egomotion estimation, affine stabilization, mosaics, graph representation of objects trajectories, event analysis, geospatial and mission contexts, scenario recognition, finite automaton.
Citation:
Gérard Medioni, Isaac Cohen, François Brémond, Somboon Hongeng, Ramakant Nevatia, "Event Detection and Analysis from Video Streams," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 873-889, Aug. 2001, doi:10.1109/34.946990
Usage of this product signifies your acceptance of the Terms of Use.