The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2010 vol.32)
pp: 2178-2190
David Liu , Siemens Corporate Research, Princeton
Gang Hua , Nokia Research Center Hollywood, Santa Monica
Tsuhan Chen , Cornell University, Ithaca
ABSTRACT
We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then determine which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each window's feature descriptor has the chance of genuinely describing the object of interest; hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating frames as independent, we can hypothesize the location of the windows more accurately. Third, by infusing prior knowledge into the patch-level model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of windows and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips.
INDEX TERMS
Topic model, probabilistic graphical model, Multiple Instance Learning, semi-supervised learning, object detection, video object summarization.
CITATION
David Liu, Gang Hua, Tsuhan Chen, "A Hierarchical Visual Model for Video Object Summarization", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 12, pp. 2178-2190, December 2010, doi:10.1109/TPAMI.2010.31
REFERENCES
[1] D. Liu, G. Hua, and T. Chen, "Videocut: Removing Irrelevant Frames by Discovering the Object of Interest," Proc. European Conf. Computer Vision, vol. 1, pp. 441-453, 2008.
[2] H. Schneiderman and T. Kanade, "Object Detection Using the Statistics of Parts," Int'l J. Computer Vision, vol. 56, pp. 151-177, 2004.
[3] P. Viola and M. Jones, "Robust Real-Time Face Detection," Int'l J. Computer Vision, vol. 57, pp. 137-154, 2004.
[4] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[5] J. Sivic, F. Schaffalitzky, and A. Zisserman, "Object Level Grouping for Video Shots," Int'l J. Computer Vision, vol. 67, pp. 189-210, 2006.
[6] O. Maron and T. Lozano-Perez, "A Framework for Multiple Instance Learning," Proc. Advances in Neural Information Processing Systems, 1998.
[7] Y. Chen, J. Bi, and J. Wang, "MILES: Multiple Instance Learning via Embedded Instance Selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Dec. 2006.
[8] P. Viola, J. Platt, and C. Zhang, "Multiple Instance Boosting for Object Detection," Proc. Advances in Neural Information Processing Systems, 2005.
[9] M. Brand, N. Oliver, and A. Pentland, "Coupled Hidden Markov Models for Complex Action Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 994-999, 1997.
[10] P. Smith, N. Lobo, and M. Shah, "Temporalboost for Event Recognition," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 733-740, 2005.
[11] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[12] R. Fergus, P. Perona, and A. Zisserman, "A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 380-387, 2005.
[13] P. Felzenszwalb and D. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, no. 1, pp. 55-79, 2005.
[14] B. Leibe, K. Schindler, and L. Gool, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, pp. 259-289, 2008.
[15] L. Fei-Fei, R. Fergus, and P. Perona, "One-Shot Learning of Object Categories," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, Apr. 2006.
[16] C. Lampert, M. Blaschko, and T. Hofmann, "Beyond Sliding Windows: Object Localization by Efficient Subwindow Search," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[17] K. Nigam and R. Ghani, "Analyzing the Effectiveness and Applicability of Co-Training," Proc. Int'l Conf. Information and Knowledge Management, 2000.
[18] Y. Li, H. Li, C. Guan, and Z. Chin, "A Self-Training Semi-Supervised Support Vector Machine Algorithm and Its Applications in Brain Computer Interface," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 2007.
[19] C. Rosenberg, M. Hebert, and H. Schneiderman, "Semi-Supervised Self-Training of Object Detection Models," Proc. IEEE Workshop Applications of Computer Vision, 2005.
[20] I. Cohen, F. Cozman, N. Sebe, M. Cirelo, and T. Huang, "Semi-Supervised Learning of Classifiers: Theory, Algorithms and Their Applications to Human-Computer Interaction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 12, pp. 1553-1567, Dec. 2004.
[21] J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman, "Discovering Objects and Their Location in Images," Proc. IEEE Int'l Conf. Computer Vision, 2005.
[22] B. Russell, A. Efros, J. Sivic, W. Freeman, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and Their Extent in Image Collections," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[23] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky, "Learning Hierarchical Models of Scenes, Objects, and Parts," Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1331-1338, 2005.
[24] L. Fei-Fei and P. Perona, "A Bayesian Hierarchical Model for Learning Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[25] J. Verbeek and B. Triggs, "Region Classification with Markov Field Aspect Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 17-22, 2007.
[26] L. Cao and L. Fei-Fei, "Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 1-8, 2007.
[27] D. Liu and T. Chen, "A Topic-Motion Model for Unsupervised Video Object Discovery," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[28] D. Liu and T. Chen, "DISCOV: A Framework for Discovering Objects in Video," IEEE Trans. Multimedia, vol. 10, no. 2, pp. 200-208, Feb. 2008.
[29] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., 1988.
[30] M. Jordan, Learning in Graphical Models. MIT Press, 1999.
[31] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 42, pp. 177-196, 2001.
[32] C. Schmid, "Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval," Int'l J. Computer Vision, vol. 56, pp. 7-16, 2004.
[33] O. Chum and A. Zisserman, "An Exemplar Model for Learning Object Classes," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[34] D. Lowe, "Object Recognition from Local Scale Invariant Features," Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1150-1157, 1999.
[35] X. Zhou and T. Huang, "Relevance Feedback in Image Retrieval: A Comprehensive Review," ACM Multimedia Systems J., vol. 8, pp. 536-544, 2003.
[36] A. Torralba, "Contextual Priming for Object Detection," Int'l J. Computer Vision, vol. 53, pp. 169-191, 2003.
[37] A. Rabinovich, A. Vedaldi, C. Galleguillos, B. Wiewiora, and S. Belongie, "Objects in Context," Proc. Int'l Conf. Computer Vision, pp. 1-8, 2007.
[38] D. Parikh, L. Zitnick, and T. Chen, "From Appearance to Context-Based Recognition: Dense Labeling in Small Images," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[39] M. Riesenhuber and T. Poggio, "Hierarchical Models of Object Recognition in Cortex," Nature Neuroscience, vol. 2, pp. 1019-1025, 1999.
[40] M. Ranzato, F. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1-8, 2007.
[41] B. Epshtein and S. Ullman, "Feature Hierarchies for Object Classification," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 220-227, 2005.
[42] Z. Tu, X. Chen, A.L. Yuille, and S.C. Zhu, "Image Parsing: Unifying Segmentation, Detection, and Recognition," Int'l J. Computer Vision, vol. 63, pp. 113-140, 2005.
[43] S. Fidler, G. Berginc, and A. Leonardis, "Hierarchical Statistical Learning of Generic Parts of Object Structure," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 182-189, 2006.
[44] B. Ommer and J.M. Buhmann, "Learning the Compositional Nature of Visual Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 1-8, 2007.
[45] S. Todorovic and N. Ahuja, "Unsupervised Category Modeling, Recognition, and Segmentation in Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 12, pp. 2158-2174, Dec. 2008.
[46] F. Fleuret and D. Geman, "Coarse-to-Fine Face Detection," Int'l J. Computer Vision, vol. 41, pp. 85-107, 2001.
[47] Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, "Webcam Synopsis: Peeking Around the World," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[48] J. Matas, O. Chum, M. Urban, and T. Pajdla, "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions," Proc. British Machine Vision Conf., 2002.
[49] http://www.robots.ox.ac.uk/~vgg/research affine/, 2010.
[50] J. van de Weijer and C. Schmid, "Coloring Local Feature Extraction," Proc. European Conf. Computer Vision, 2006.
[51] B. Julesz, "Textons, the Elements of Texture Perception and Their Interactions," Nature, vol. 290, pp. 91-97, 1981.
[52] M. Isard, "Pampas: Real-Valued Graphical Models for Computer Vision," Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 613-620, 2003.
[53] L. Mason, J. Baxter, P. Bartlett, and M. Frean, "Boosting Algorithms as Gradient Descent," Proc. Advances in Neural Information Processing Systems, 1999.
[54] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc., vol. 39, pp. 1-38, 1977.
[55] B. Cestnik, "Estimating Probabilities: A Crucial Task in Machine Learning," Proc. European Conf. Artificial Intelligence, pp. 147-149, 1990.
[56] Y. Bar-Shalom and T. Fortmann, Tracking and Data Association. Academic Press, 1988.
[57] http://www-nlpir.nist.gov/projectstrecvid /, 2010.
[58] Y. Freund and R. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, pp. 119-139, 1997.
[59] K. Bennett, A. Demiriz, and R. Maclin, "Exploiting Unlabeled Data in Ensemble Methods," Proc. Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[60] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, "A Comparison of Affine Region Detectors," Int'l J. Computer Vision, vol. 65, pp. 43-72, 2005.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool