This Article 
 Bibliographic References 
 Add to: 
PADS: A Probabilistic Activity Detection Framework for Video Data
December 2010 (vol. 32 no. 12)
pp. 2246-2261
Massimiliano Albanese, University of Maryland, College Park
Rama Chellappa, University of Maryland, College Park
Naresh Cuntoor, Kitware Inc., Clifton Park
Vincenzo Moscato, Università di Napoli "Federico II", Napoli
Antonio Picariello, Università di Napoli "Federico II", Napoli
V.S. Subrahmanian, University of Maryland, College Park
Octavian Udrea, IBM T.J. Watson Research Center, Hawthorne
There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can specify activities of interest. We then develop a probabilistic framework which assigns to any subvideo of a given video sequence a probability that the subvideo contains the given activity, and we finally develop two fast algorithms to detect activities within this framework. OffPad finds all minimal segments of a video that contain a given activity with a probability exceeding a given threshold. In contrast, the OnPad algorithm examines a video during playout (rather than afterwards as OffPad does) and computes the probability that a given activity is occurring (even if the activity is only partially complete). Our prototype Probabilistic Activity Detection System (PADS) implements the framework and the two algorithms, building on top of existing image processing algorithms. We have conducted detailed experiments and compared our approach to four different approaches presented in the literature. We show that—for complex activity definitions—our approach outperforms all the other approaches.

[1] R. Fagin, "Combining Fuzzy Information from Multiple Systems," J. Computer and Systems Sciences, vol. 58, pp. 83-99, 1999.
[2] V.-T. Vu, F. Brémond, and M. Thonnat, "Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition," Proc. 18th Int'l Joint Conf. Artificial Intelligence, pp. 1295-1302, 2003.
[3] P. Natarajan and R. Nevatia, "EDF: A Framework for Semantic Annotation of Video," Proc. 10th IEEE Int'l Conf. Computer Vision, p. 1876, Oct. 2005.
[4] A. Hakeem, Y. Sheikh, and M. Shah, "CASEE: A Hierarchical Event Representation for the Analysis of Videos," Proc. 19th Nat'l Conf. Artificial Intelligence, pp. 263-268, 2004.
[5] X. Li and F.M. Porikli, "A Hidden Markov Model Framework for Traffic Event Detection Using Video Features," Proc. IEEE Int'l Conf. Image Processing, vol. 5, pp. 2901-2904, Oct. 2004.
[6] F. Wang, Y.-F. Ma, H.-J. Zhang, and J.-T. Li, "Dynamic Bayesian Network Based Event Detection for Soccer Highlight Extraction," Proc. IEEE Int'l Conf. Image Processing, vol. 1, pp. 633-636, Oct. 2004.
[7] J.F. Allen, "Towards a General Theory of Action and Time," Artificial Intelligence, vol. 23, no. 2, pp. 123-154, 1984.
[8] R. Kowalski and M.M. Sergot, "A Logic-Based Calculus of Events," New Generation Computing, vol. 4, no. 1, pp. 67-95, 1986.
[9] S. Marcus and V. Subrahmanian, "Foundations of Multimedia Database Systems," J. ACM, vol. 43, no. 3, pp. 474-523, 1996.
[10] L. Chittaro and A. Montanari, "Temporal Representation and Reasoning in Artificial Intelligence: Issues and Approaches," Annals of Math. and Artificial Intelligence, vol. 28, nos. 1-4, pp. 47-106, 2000.
[11] R. Dechter, I. Meiri, and J. Pearl, "Temporal Constraint Networks," Artificial Intelligence, vol. 49, nos. 1-3, pp. 61-95, 1991.
[12] M.C. Buchanan and P.T. Zellweger, "Automatic Temporal Layout Mechanisms," Proc. First ACM Int'l Conf. Multimedia, pp. 341-350, 1993.
[13] K.S. Candan, E. Lemar, and V.S. Subrahmanian, "View Management in Multimedia Databases," The VLDB J., vol. 9, no. 2, pp. 131-153, 2000.
[14] A.R.J. Francois, R. Nevatia, J. Hobbs, and R.C. Bolles, "VERL: An Ontology Framework for Representing and Annotating Video Events," IEEE MultiMedia, vol. 12, no. 4, pp. 76-86, Oct.-Dec. 2005.
[15] H. Rogers, Theory of Recursive Functions and Effective Computability. Mc-Graw-Hill, 1967.
[16] T. Starner and A. Pentland, "Real-Time American Sign Language Recognition from Video Using Hidden Markov Models," Proc. IEEE Int'l Symp. Computer Vision, pp. 265-270, 1995.
[17] A.D. Wilson and A.F. Bobick, "Parametric Hidden Markov Models for Gesture Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 884-900, Sept. 1999.
[18] T. Izo and W.E.L. Grimson, "Simultaneous Pose Estimation and Camera Calibration from Multiple Views," Proc. IEEE Workshop Motion of Non-Rigid and Articulated Objects, vol. 1, pp. 14-21, 2004.
[19] A. Kale, A.N. Rajagopalan, A. Sundaresan, N. Cuntoor, A.K. Roy-Chowdhury, V. Kruger, and R. Chellappa, "Identification of Humans Using Gait," IEEE Trans. Image Processing, vol. 13, no. 9, pp. 1163-1173, Sept. 2004.
[20] M. Brand, N. Oliver, and A. Pentland, "Coupled Hidden Markov Models for Complex Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 994-999, 1997.
[21] N. Vaswani, A.K. Roy-Chowdhury, and R. Chellappa, "Shape Activity: A Continuous-State HMM for Moving/Deforming Shapes with Application to Abnormal Activity Detection," IEEE Trans. Image Processing, vol. 14, no. 10, pp. 1603-1616, Oct. 2005.
[22] N. Oliver, E. Horvitz, and A. Garg, "Layered Representations for Human Activity Recognition," Proc. IEEE Int'l Conf. Mulitmodal Interfaces, pp. 3-7, 2002.
[23] R. Hamid, Y. Huang, and I. Essa, "Argmode—Activity Recognition Using Graphical Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 4, pp. 38-43, 2003.
[24] S. Hongeng, R. Nevatia, and F. Bremond, "Video-Based Event Recognition: Activity Representation and Probabilistic Recognition Methods," Computer Vision and Image Understanding, vol. 96, no. 2, pp. 129-162, Nov. 2004.
[25] S. Hongeng and R. Nevatia, "Multi-Agent Event Recognition," Proc. Eighth IEEE Int'l Conf. Computer Vision, vol. 2, pp. 84-93, 2001.
[26] H. Zhong, J. Shi, and M. Visontai, "Detecting Unusual Activity in Video," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 819-826, 2004.
[27] C. Stauffer and W.E.L. Grimson, "Learning Patterns of Activity Using Real-Time Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747-757, Aug. 2000.
[28] R. Hamid, A.Y. Johnson, S. Batta, A.F. Bobick, C.L. Isbell, and G. Coleman, "Detection and Explanation of Anomalous Activities: Representing Activities as Bags of Event n-Grams," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1031-1038, 2005.
[29] M. Albanese, V. Moscato, A. Picariello, V.S. Subrahmanian, and O. Udrea, "Detecting Stochastically Scheduled Activities in Video," Proc. 20th Int'l Joint Conf. Artificial Intelligence, pp. 1802-1807, 2007.
[30] N.P. Cuntoor, B. Yegnanarayana, and R. Chellappa, "Activity Modeling Using Event Probability Sequences," IEEE Trans. Image Processing, vol. 17, no. 4, pp. 594-607, Apr. 2008.
[31] C. Rao, A. Yilmaz, and M. Shah, "View Invariant Representation and Recognition of Actions," Int'l J. Computer Vision, vol. 50, no. 2, pp. 203-226, Nov. 2002.
[32] N.P. Cuntoor and R. Chellappa, "Key Frame-Based Activity Representation Using Antieigenvalues," Proc. Seventh Asian Conf. Computer Vision, pp. 499-508, Jan. 2006.
[33] Y.A. Ivanov and A.F. Bobick, "Recognition of Visual Activities and Interactions by Stochastic Parsing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 852-872, Aug. 2000.
[34] T.F. Shipley and J.M. Zacks, Understanding Events. Oxford Univ. Press, 2008.
[35] C. Town and D. Sinclair, "Language-Based Querying of Image Collections on the Basis of an Extensible Ontology," Image and Vision Computing, vol. 22, no. 3, pp. 251-267, Mar. 2004.
[36] N.T. Siebel, "Design and Implementation of People Tracking Algorithms for Visual Surveillance Applications," PhD dissertation, Reading Univ., Mar. 2003.
[37] N.T. Siebel and S. Maybank, "The Advisor Visual Surveillance System" Proc. ECCV 2004 Workshop Applications of Computer Vision, pp. 103-111, May 2004.
[38] J. Shoenfield, Mathematical Logic. Addison Wesley, 1967.

Index Terms:
Applications and expert knowledge-intensive systems, computer vision, vision and scene understanding, video analysis, image processing and computer vision, applications.
Massimiliano Albanese, Rama Chellappa, Naresh Cuntoor, Vincenzo Moscato, Antonio Picariello, V.S. Subrahmanian, Octavian Udrea, "PADS: A Probabilistic Activity Detection Framework for Video Data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2246-2261, Dec. 2010, doi:10.1109/TPAMI.2010.33
Usage of this product signifies your acceptance of the Terms of Use.