The Community for Technology Leaders
RSS Icon
Issue No.10 - Oct. (2012 vol.34)
pp: 1927-1941
Ming-Fang Weng , National Taiwan University, Taipei
Yung-Yu Chuang , National Taiwan University, Taipei
The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.
Context awareness, Indexing, Semantics, Feature extraction, Training data, Video annotation, Detectors, trecvid., Video annotation, concept detection, cross-domain learning, contextual correlation, temporal dependency
Ming-Fang Weng, Yung-Yu Chuang, "Cross-Domain Multicue Fusion for Concept-Based Video Indexing", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 10, pp. 1927-1941, Oct. 2012, doi:10.1109/TPAMI.2011.273
[1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-Based Image Retrieval at the End of the Early Years," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[2] M.S. Lew, N. Sebe, C. Djeraba, and R. Jain, "Content-Based Multimedia Information Retrieval: State of the Art and Challenges," ACM Trans. Multimedia Computing, Comm., and Applications, vol. 2, no. 1, pp. 1-19, 2006.
[3] R. Datta, D. Joshi, J. Li, and J.Z. Wang, "Image Retrieval: Ideas, Influences, and Trends of the New Age," ACM Computing Surveys, vol. 40, no. 2, pp. 1-60, 2008.
[4] W.H. Adams, G. Iyengar, C.-Y. Lin, M.R. Naphade, C. Neti, H.J. Nock, and J.R. Smith, "Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues," Eurasip J. Applied Signal Processing, vol. 2003, no. 2, pp. 170-185, 2003.
[5] J. Smith, M. Naphade, and A. Natsev, "Multimedia Semantic Indexing Using Model Vectors," Proc. Int'l Conf. Multimedia and Expo, vol. 2, pp. 445-448, 2003.
[6] A. Amir, J. Argilander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M.R. Naphade, A.P. Natsev, J.R. Smith, J. Tesic, and T. Volkmer, "IBM Research TRECVID-2005 Video Retrieval System," Proc. TRECVID Workshop, 2005.
[7] C.G.M. Snoek and M. Worring, "Multimodal Video Indexing: A Review of the State-of-the-Art," Multimedia Tools and Applications, vol. 25, no. 1, pp. 5-35, 2005.
[8] A.F. Smeaton, P. Over, and W. Kraaij, "Evaluation Campaigns and TRECVid," Proc. ACM Int'l Workshop Multimedia Information Retrieval, pp. 321-330, 2006.
[9] C.G.M. Snoek and M. Worring, "Concept-Based Video Retrieval," Foundations and Trends in Information Retrieval, vol. 2, no. 4, pp. 215-322, 2009.
[10] M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis, "Large-Scale Concept Ontology for Multimedia," IEEE Multimedia, vol. 13, no. 3, pp. 86-91, July-Sept. 2006.
[11] J. Yang, R. Yan, and A.G. Hauptmann, "Cross-Domain Video Concept Detection Using Adaptive SVMs," Proc. ACM Multimedia, pp. 188-197, 2007.
[12] Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo, "Domain Adaptive Semantic Diffusion for Large Scale Context-Based Video Annotation," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[13] C.G.M. Snoek, M. Worring, J.C. van Gemert, J.M. Geusebroek, and A.W.M. Smeulders, "The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia," Proc. ACM Multimedia, pp. 421-430, 2006.
[14] A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu, "Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts," technical report, Columbia Univ., 2007.
[15] Y.-G. Jiang, C.-W. Ngo, and J. Yang, "Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval," Proc. Sixth ACM Int'l Conf. Image and Video Retrieval, 2007.
[16] Y.-G. Jiang, A. Yanagawa, S.-F. Chang, and C.-W. Ngo, "CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection," technical report, Columbia Univ., 2008.
[17] K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen, "Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video," IEEE Trans. Multimedia, vol. 10, no. 2, pp. 240-251, Feb. 2008.
[18] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, M. Wang, and H.-J. Zhang, "Correlative Multilabel Video Annotation with Temporal Kernels," ACM Trans. Multimedia Computing, Comm., and Applications, vol. 5, no. 1, pp. 1-27, 2008.
[19] M.-F. Weng and Y.-Y. Chuang, "Multi-Cue Fusion for Semantic Video Indexing," Proc. ACM Multimedia, pp. 71-80, 2008.
[20] R. Yan, A. Hauptmann, and R. Jin, "Multimedia Search with Pseudo-Relevance Feedback," Proc. Second Int'l Conf. Image and Video Retrieval, pp. 238-247, 2003.
[21] W.H. Hsu, L.S. Kennedy, and S.-F. Chang, "Video Search Reranking via Information Bottleneck Principle," Proc. ACM Multimedia, pp. 35-44, 2006.
[22] A.P. Natsev, A. Haubold, J. Tesic, L. Xie, and R. Yan, "Semantic Concept-Based Query Expansion and Re-Ranking for Multimedia Retrieval," Proc. ACM Multimedia, pp. 991-1000, 2007.
[23] L.S. Kennedy and S.-F. Chang, "A Reranking Approach for Context-Based Concept Fusion in Video Indexing and Retrieval," Proc. Sixth ACM Int'l Conf. Image and Video Retrieval, pp. 333-340, 2007.
[24] D. Wang, X. Liu, L. Luo, J. Li, and B. Zhang, "Video Diver: Generic Video Indexing with Diverse Features," Proc. ACM Int'l Workshop Multimedia Information Retrieval, pp. 61-70, 2007.
[25] Y.-G. Jiang, J. Yang, C.-W. Ngo, and A.G. Hauptmann, "Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study," IEEE Trans. Multimedia, vol. 12, no. 1, pp. 42-53, Jan. 2010.
[26] M.R. Naphade and T.S. Huang, "A Probabilistic Framework for Semantic Video Indexing, Filtering, and Retrieval," IEEE Trans. Multimedia, vol. 3, no. 1, pp. 141-151, Mar. 2001.
[27] M.R. Naphade, I.V. Kozintsev, and T.S. Huang, "Factor Graph Framework for Semantic Video Indexing," IEEE Trans. Circuits and Systems for Video Technology, vol. 12, no. 1, pp. 40-52, Jan. 2002.
[28] J. Yang and A.G. Hauptmann, "Exploring Temporal Consistency for Video Analysis and Retrieval," Proc. Eighth ACM Int'l Workshop Multimedia Information Retrieval, pp. 33-42, 2006.
[29] M.-F. Weng and Y.-Y. Chuang, "Collaborative Video Reindexing via Matrix Factorization," ACM Trans. Multimedia Computing, Comm, and Applications, vol. 8, no. 2, pp. 1-20, May 2012.
[30] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang, "Correlative Multi-Label Video Annotation," Proc. ACM Multimedia, pp. 17-26, 2007.
[31] W. Jiang, S.-F. Chang, and A. Loui, "Context-Based Concept Fusion with Boosted Conditional Random Fields," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 949-952, 2007.
[32] Y.-H. Yang, W.H. Hsu, and H.H. Chen, "Online Reranking Via Ordinal Informative Concepts for Context Fusion in Concept Detection and Video Search," IEEE Trans. Circuits and Systems for Video Technology, vol. 19, no. 12, pp. 1880-1890, Dec. 2009.
[33] J. Cao, Y. Lan, J. Li, Q. Li, X. Li, F. Lin, X. Liu, L. Luo, W. Peng, D. Wang, H. Wang, Z. Wang, Z. Xiang, J. Yuan, B. Zhang, J. Zhang, L. Zhang, X. Zhang, and W. Zheng, "Intelligent Multimedia Group of Tsinghua University at TRECVID 2006," Proc. TRECVID Workshop, 2006.
[34] Y. Jing and S. Baluja, "VisualRank: Applying Pagerank to Large-Scale Image Search," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1877-1890, Nov. 2008.
[35] D. Moore, I. Essa, and M. Hayes, "Exploiting Human Actions and Object Context for Recognition Tasks," Proc. IEEE Int'l Conf. Computer Vision, vol. 1, pp. 80-86, 1999.
[36] A. Gupta and L.S. Davis, "Objects in Action: An Approach for Combining Action Understanding and Object Perception," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[37] J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J.M. Rehg, "A Scalable Approach to Activity Recognition Based on Object Use," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[38] M. Marszalek, I. Laptev, and C. Schmid, "Actions in Context," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2929-2936, 2009.
[39] A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis, "Understanding Videos, Constructing Plots Learning a Visually Grounded Storyline Model from Annotated Videos," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2012-2019, 2009.
[40] L. Hyafil and R.L. Rivest, "Constructing Optimal Binary Decision Trees Is NP-Complete," Information Processing Letters, vol. 5, no. 1, pp. 15-17, 1976.
[41] C.M. Bishop, Pattern Recognition and Machine Learning, first ed. Springer, 2007.
[42] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C, second ed. Cambridge Univ. Press, 1992.
[43] E. Yilmaz and J.A. Aslam, "Estimating Average Precision with Incomplete and Imperfect Judgments," Proc. ACM Int'l Conf. Information and Knowledge Management, pp. 102-111, 2006.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool