This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing
October 2006 (vol. 28 no. 10)
pp. 1678-1689
This paper presents the semantic pathfinder architecture for generic indexing of multimedia archives. The semantic pathfinder extracts semantic concepts from video by exploring different paths through three consecutive analysis steps, which we derive from the observation that produced video is the result of an authoring-driven process. We exploit this authoring metaphor for machine-driven understanding. The pathfinder starts with the content analysis step. In this analysis step, we follow a data-driven approach of indexing semantics. The style analysis step is the second analysis step. Here, we tackle the indexing problem by viewing a video from the perspective of production. Finally, in the context analysis step, we view semantics in context. The virtue of the semantic pathfinder is its ability to learn the best path of analysis steps on a per-concept basis. To show the generality of this novel indexing approach, we develop detectors for a lexicon of 32 concepts and we evaluate the semantic pathfinder against the 2004 NIST TRECVID video retrieval benchmark, using a news archive of 64 hours. Top ranking performance in the semantic concept detection task indicates the merit of the semantic pathfinder for generic indexing of multimedia archives.

[1] C.G.M. Snoek and M. Worring, “Multimodal Video Indexing: A Review of the State-of-the-Art,” Multimedia Tools Applications, vol. 25, no. 1, pp. 5-35, 2005.
[2] M.R. Naphade and T.S. Huang, “Extracting Semantics from Audiovisual Content: The Final Frontier in Multimedia Retrieval,” IEEE Trans. Neural Networks, vol. 13, no. 4, pp. 793-810, 2002.
[3] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content Based Image Retrieval at the End of the Early Years,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
[4] J.R. Smith and S.-F. Chang, “Visually Searching the Web for Content,” IEEE Multimedia, vol. 4, no. 3, pp. 12-20, July-Sept. 1997.
[5] H.-J. Zhang, S.Y. Tan, S.W. Smoliar, and Y. Gong, “Automatic Parsing and Indexing of News Video,” Multimedia Systems, vol. 2, no. 6, pp. 256-266, 1995.
[6] J.M. Boggs and D.W. Petrie, The Art of Watching Films, fifth ed. Mountain View, Calif.: Mayfield Publishing Company, 2000.
[7] D. Bordwell and K. Thompson, Film Art: An Introduction, fifth ed. McGraw-Hill, 1997.
[8] H.D. Wactlar, M.G. Christel, Y. Gong, and A.G. Hauptmann, “Lessons Learned from Building a Terabyte Digital Video Library,” Computer, vol. 32, no. 2, pp. 66-73, Feb. 1999.
[9] A.G. Hauptmann et al., “Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video,” Proc. TRECVID Workshop, 2003.
[10] N. Haering, R. Qian, and I. Sezan, “A Semantic Event-Detection Approach and Its Application to Detecting Hunts in Wildlife Video,” IEEE Trans. Circuits, Systems, and Video Technology, vol. 10, no. 6, pp. 857-868, 2000.
[11] N. Babaguchi, Y. Kawai, and T. Kitahashi, “Event Based Indexing of Broadcasted Sports Video by Intermodal Collaboration,” IEEE Trans. Multimedia, vol. 4, no. 1, pp. 68-75, 2002.
[12] A.A. Alatan, A.N. Akansu, and W. Wolf, “Multi-Modal Dialogue Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing,” Multimedia Tools Applications, vol. 14, no. 2, pp. 137-151, 2001.
[13] J. Fan, A.K. Elmagarmid, X. Zhu, W.G. Aref, and L. Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing,” IEEE Trans. Multimedia, vol. 6, no. 1, pp. 70-86, 2004.
[14] A. Amir et al., “IBM Research TRECVID-2003 Video Retrieval System,” Proc. TRECVID Workshop, 2003.
[15] C.G.M. Snoek, M. Worring, and A.G. Hauptmann, “Learning Rich Semantics from News Video Archives by Style Analysis,” ACM Trans. Multimedia Computing, Comm. Applications, vol. 2, no. 2, pp. 91-108, May 2006.
[16] A.F. Smeaton, W. Kraaij, and P. Over, “The TREC VIDeo Retrieval Evaluation (TRECVID): A Case Study and Status Report,” Proc. Int'l Conf. Computer-Assisted Information Retrieval, 2004.
[17] A.F. Smeaton, P. Over, and W. Kraaij, “TRECVID: Evaluating the Effectiveness of Information Retrieval Tasks on Digital Video,” ACM Multimedia, 2004.
[18] G.M. Quénot, D. Moraru, L. Besacier, and P. Mulhem, “CLIPS at TREC-11: Experiments in Video Retrieval,” Proc. 11th Text REtrieval Conf., 2002.
[19] A.G. Hauptmann, “Towards a Large Scale Concept Ontology for Broadcast Video,” Proc. Third Int'l Conf. Image and Video Retrieval, 2004.
[20] C.-Y. Lin, B.L. Tseng, and J.R. Smith, “Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets,” Proc. TRECVID Workshop, 2003.
[21] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer-Verlag, 2000.
[22] C.-C. Chang and C.-J. Lin, LIBSVM: A Library For Support Vector Machines, 2001, http://www.csie.ntu.edu.tw/cjlinlibsvm/.
[23] C.G.M. Snoek and M. Worring, “Multimedia Event-Based Video Indexing Using Time Intervals,” IEEE Trans. Multimedia, vol. 7, no. 4, pp. 638-647, 2005.
[24] J.C. Platt, “Probabilities for SV Machines,” Advances in Large Margin Classifiers, pp. 61-74, 2000.
[25] M.R. Naphade, “On Supervision and Statistical Learning for Semantic Multimedia Analysis,” J. Visual Comm. Image Representation, vol. 15, no. 3, pp. 348-369, 2004.
[26] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[27] J.M. Geusebroek, R. van den Boomgaard, A.W.M. Smeulders, and H. Geerts, “Color Invariance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1338-1350, Dec. 2001.
[28] F.J. Seinstra, C.G.M. Snoek, D. Koelma, J.M. Geusebroek, and M. Worring, “User Transparent Parallel Processing of the 2004 NIST TRECVID Data Set,” Proc. Int'l Parallel Distribution Processing Symp., 2005.
[29] H.E. Bal et al., “The Distributed ASCI Supercomputer Project,” Operating System Rev., vol. 34, no. 4, pp. 76-96, 2000.
[30] C.G.M. Snoek, “The Authoring Metaphor to Machine Understanding of Multimedia,” PhD dissertation, Univ. of Amsterdam, 2005, http://www.science.uva.nl/~cgmsnoek/pubsnoek-thesis.pdf.
[31] T. Sato, T. Kanade, E.K. Hughes, M.A. Smith, and S. Satoh, “Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption,” Multimedia Systems, vol. 7, no. 5, pp. 385-395, 1999.
[32] J.L. Gauvain, L. Lamel, and G. Adda, “The LIMSI Broadcast News Transcription System,” Speech Comm., vol. 37, nos. 1-2, pp. 89-108, 2002.
[33] H. Schneiderman and T. Kanade, “Object Detection Using the Statistics of Parts,” Int'l J. Computer Vision, vol. 56, no. 3, pp. 151-177, 2004.
[34] J. Baan et al., “Lazy Users and Automatic Video Retrieval Tools in (the) Lowlands,” Proc. 10th Text REtrieval Conf., E.M. Voorhees and D.K. Harman, eds., pp. 159-168, 2001.
[35] C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, and F.J. Seinstra, “The MediaMill TRECVID 2004 Semantic Video Search Engine,” Proc. TRECVID Workshop, 2004.
[36] C.G.M. Snoek et al., “MediaMill: Exploring News Video Archives Based on Learned Semantics,” Proc. ACM Multimedia Conf., pp. 225-226, 2005.

Index Terms:
Video analysis, concept learning, benchmarking, content analysis and indexing, multimedia information systems, pattern recognition.
Citation:
Cees G.M. Snoek, Marcel Worring, Jan-Mark Geusebroek, Dennis C. Koelma, Frank J. Seinstra, Arnold W.M. Smeulders, "The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1678-1689, Oct. 2006, doi:10.1109/TPAMI.2006.212
Usage of this product signifies your acceptance of the Terms of Use.