The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2009 vol.31)
pp: 539-555
Xiaogang Wang , MIT, Cambridge
Xiaoxu Ma , MIT, Cambridge
W.E.L. Grimson , MIT, Cambridge
ABSTRACT
We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.
INDEX TERMS
Vision and Scene Understanding, Artificial Intelligence, Computing Methodologies, Video analysis, Machine learning, Motion, Applications, Statistical, Computer vision, Algorithms, Clustering, Pattern Recognition, Computing Methodologies
CITATION
Xiaogang Wang, Xiaoxu Ma, W.E.L. Grimson, "Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 3, pp. 539-555, March 2009, doi:10.1109/TPAMI.2008.87
REFERENCES
[1] D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent Dirichlet Allocation,” J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[2] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei, “Hierarchical Dirichlet Process,” J. Am. Statistical Assoc., 2006.
[3] H. Zhong, J. Shi, and M. Visontai, “Detecting Unusual Activity in Video,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[4] L. Zelnik-Manor and M. Irani, “Event-Based Analysis of Video,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2001.
[5] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis. Chapman and Hall/CRC, 2004.
[6] L. Fei-Fei and P. Perona, “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2005.
[7] C. Stauffer and E. Grimson, “Learning Patterns of Activity Using Real-Time Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 747-757, 2000.
[8] N. Oliver, B. Rosario, and A. Pentland, “A Bayesian Computer Vision System for Modeling Human Interactions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 831-843, 2000.
[9] X. Wang, K. Tieu, and E. Grimson, “Learning Semantic Scene Models by Trajectory Analysis,” Proc. Ninth European Conf. Computer Vision, 2006.
[10] S. Honggeng and R. Nevatia, “Multi-Agent Event Recognition,” Proc. Int'l Conf. Computer Vision, 2001.
[11] S.S. Intille and A.F. Bobick, “A Framework for Recognizing Multi-Agent Action from Visual Evidence,” Proc. 16th Nat'l Conf. Artificial Intelligence, 1999.
[12] I. Haritaoglu, D. Harwood, and L.S. Davis, “W4: Real-Time Surveillance of People and Their Activities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 809-830, 2000.
[13] G. Medioni, I. Cohen, F. BreAmond, S. Hongeng, and R. Nevatia, “Event Detection and Analysis from Video Streams,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, pp. 873-889, 2001.
[14] M. Brand and V. Kettnaker, “Discovery and Segmentation of Activities in Video,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 844-851, 2000.
[15] J. Fernyhough, A.G. Cohn, and D.C. Hogg, “Constructing Qualitative Event Models Automatically from Video Input,” Image and Vision Computing, vol. 18, pp. 81-103, 2000.
[16] N. Johnson and D. Hogg, “Learning the Distribution of Object Trajectories for Event Recognition,” Proc. Sixth British Machine Vision Conf., 1995.
[17] T.T. Truyen, D.Q. Phung, H.H. Bui, and S. Venkatesh, “Adaboost.mrf: Boosted Markov Random Forests and Application to Multilevel Activity Recognition,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2006.
[18] T. Xiang and S. Gong, “Beyond Tracking: Modelling Activity and Understanding Behaviour,” Int'l J. Computer Vision, vol. 67, pp. 21-51, 2006.
[19] N. Ghanem, D. Dementhon, D. Doermann, and L. Davis, “Representation and Recognition of Events in Surveillance Video Using Petri Net,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition Workshops, 2004.
[20] P. Smith, N.V. Lobo, and M. Shah, “Temporalboost for Event Recognition,” Proc. Int'l Conf. Computer Vision, 2005.
[21] J.W. Davis and A.F. Bobick, “The Representation and Recognition of Action Using Temporal Templates,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 1997.
[22] T. Xiang and S. Gong, “Video Behaviour Profiling and Abnormality Detection without Manual Labelling,” Proc. Int'l Conf. Computer Vision, 2005.
[23] Y. Wang, T. Jiang, M.S. Drew, Z. Li, and G. Mori, “Unsupervised Discovery of Action Classes,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2006.
[24] C. Rao, A. Yilmaz, and M. Shah, “View-Invariant Representation and Recognition of Actions,” Int'l J. Computer Vision, vol. 50, pp.203-226, 2002.
[25] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” Proc. Int'l Conf. Computer Vision, 2005.
[26] J.C. Niebles, H. Wang, and F. Li, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words,” Proc. 16th British Machine Vision Conf., 2006.
[27] E. Shechtman and M. Irani, “Space-Time Behavior Based Correlation,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2005.
[28] I. Laptev and T. Lindeberg, “Space-Time Interest Points,” Proc. Int'l Conf. Computer Vision, 2003.
[29] J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, and W.T. Freeman, “Discovering Object Categories in Image Collections,” Proc. Int'l Conf. Computer Vision, 2005.
[30] B.C. Russell, A.A. Efros, J. Sivic, W.T. Freeman, and A. Zisserman, “Using Multiple Segmentations to Discover Objects and Their Extent in Image Collections,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2006.
[31] E.B. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, “Learning Hierarchical Models of Scenes, Objects, and Parts,” Proc. Int'l Conf. Computer Vision, 2005.
[32] E.B. Sudderth, A. Torralba, W.T. Freeman, and A.S. Willsky, “Describing Visual Scenes Using Transformed Dirichlet Processes,” Proc. Conf. Neural Information Processing Systems, 2005.
[33] B.D. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” Proc. Int'l Joint Conf. Artificial Intelligence, pp. 674-680, 1981.
[34] T.S. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems,” The Annals of Statistics, vol. 1, pp. 209-230, 1973.
[35] S. MacEachern, A. Kottas, and A. Gelfand, “Spatial Nonparametric Bayesian Models,” technical report, Inst. of Statistics and Decision Sciences, Duke Univ., 2001.
[36] T. Hofmann, “Probabilistic Latent Semantic Analysis,” Proc. 15th Conf. Uncertainty in Artificial Intelligence, 1999.
[37] H. Schfitze and C. Silverstein, “Projections for Efficient Document Clustering,” Proc. ACM Special Interest Group on Information Retrieval, 1997.
[38] I.S. Dhillon and D.S. Modha, Concept Decompositions for Large Sparse Text Data Using Clustering, vol. 42, pp. 143-157, 2001.
[39] J. Zhang, Z. Ghahramani, and Y. Yang, “A Probabilistic Model for Online Document Clustering with Application to Novelty Detection,” Proc. Conf. Neural Information Processing Systems, 2004.
[40] I.S. Dhillon, “Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning,” Proc. ACM Special Interest Group on Knowledge Discovery and Data Mining, 2001.
[41] T.L. Griffiths and M. Steyvers, “Finding Scientific Topics,” Proc. Nat'l Academy of Sciences USA, 2004.
[42] Y.W. Teh, K. Kurihara, and M. Welling, “Collapsed Variational Inference for HDP,” Proc. Conf. Neural Information Processing Systems, 2007.
[43] X. Wang and E. Grimson, “Spatial Latent Dirichlet Allocation,” Proc. Conf. Neural Information Processing Systems, 2007.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool