The Community for Technology Leaders
RSS Icon
Issue No.09 - September (2008 vol.30)
pp: 1520-1533
This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of \emph{dissimilarity space} that is adapted to the asymmetric classification problem, and in turn to the \emph{query-by-example} and \emph{relevance feedback} paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (\emph{ie} TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.
Multimedia databases, Image/video retrieval, Concept learning, Machine learning
Eric Bruno, Nicolas Moenne-Loccoz, Steéphane Marchand-Maillet, "Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 9, pp. 1520-1533, September 2008, doi:10.1109/TPAMI.2007.70801
[1] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, “BoostMap: A Method for Efficient Approximate Similarity Rankings,” Proc. Computer Vision and Pattern Recognition (CVPR '04), pp. 268-275, 2004.
[2] L. Boldareva and D. Hiemstra, “Interactive Content-Based Retrieval Using Pre-Computed Object-Object Similarities,” Proc. Conf. Image and Video Retrieval (CIVR '04), pp. 308-316, 2004.
[3] E. Bruno, N. Moenne-Loccoz, and S. Marchand Maillet, “Learning User Queries in Multimodal Dissimilarity Spaces,” Proc. Third Int'l Workshop Adaptive Multimedia Retrieval (AMR '05), July 2005.
[4] E. Bruno, N. Moenne-Loccoz, and S. Marchand-Maillet, “Unsupervised Event Discrimination Based on Nonlinear Temporal Modelling of Activity,” Pattern Analysis and Application, vol. 7, no. 4, pp. 402-410, Dec. 2004.
[5] A.B. Chan and N. Vasconcelos, “Classifying Video with Kernel Dynamic Textures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-6, June 2007.
[6] E.Y. Chang, B. Li, G. Wu, and K. Go, “Statistical Learning for Effective Visual Information Retrieval,” Proc. IEEE Int'l Conf. Image Processing, 2003.
[7] E. Chávez, G. Navarro, R. Baeza-Yates, and J.L. Marroquin, “Searching in Metric Spaces,” ACM Computing Surveys, vol. 33, no. 3, pp. 273-321, Sept. 2001.
[8] Y. Chen, X.S. Zhou, and T.S. Huang, “One-Class SVM for Learning in Image Retrieval,” Proc. IEEE Int'l Conf. Image Processing, 2001.
[9] T.F. Cox and M.A.A. Cox, Multidimensional Scaling. Chapman and Hall, 1995.
[10] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, “On Kernel-Target Alignment,” Proc. Advances in Neural Information Processing Systems (NIPS '01), 2001.
[11] R.P.W. Duin, “The Combining Classifier: To Train or Not to Train,” Proc. 16th Int'l Conf. Pattern Recognition (ICPR '02), vol. 2, pp. 765-770, 2004.
[12] C. Faloutsos and K. Lin, “FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,” Proc. ACM SIGMOD '95, pp. 163-174, 1995.
[13] J. Gu, L. Lu, H.J. Zhang, and J. Yang, “Dominant Feature Vectors Based Audio Similarity Measure,” Proc. Pacific-Rim Conf. (PCM '04), vol. 2, pp. 890-897, 2004.
[14] T. Hastie, R. Tibshirani, and J. Friedman, “The Elements of Statistical Learning,” Springer Series in Statistics, Springer, 2001.
[15] D. Heesch and S. Rueger, “NNk Networks for Content-Based Image Retrieval,” Proc. 26th European Conf. Information Retrieval, 2004.
[16] W.H. Hsu and S.-F. Chang, “Generative, Discriminative, and Ensemble Learning on Multi-Modal Perceptual Fusion toward News Video Story Segmentation,” Proc. Int'l Conf. Multimedia and Expo (ICME '04), June 2004.
[17] A.K. Jain, A. Vailaya, and X. Wei, “Query by Video Clip,” Multimedia Systems, vol. 7, no. 5, pp. 369-384, 1999.
[18] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, “On Combining Classifiersf,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, 1998.
[19] D.G. Lowe, “Object Recognition from Local Scale Invariant Features,” Proc. Int'l Conf. Computer Vision (ICCV '99), pp. 1150-1157, 1999.
[20] N. Moënne-Loccoz, E. Bruno, and S. Marchand-Maillet, “Interactive Partial Matching of Video Sequences in Large Collections,” Proc. IEEE Int'l Conf. Image Processing (ICIP '05), Sept. 2005.
[21] M.R. Naphade, L. Kennedy, J.R. Kender, S.-F. Chang, J.R. Smith, P. Over, and A. Hauptmann, “A Light Scale Concept Ontology for Multimedia Understanding for Trecvid 2005,” technical report, IBM Research, 2005.
[22] G.P. Nguyen, M. Worring, and A.W.M. Smeulders, “Similarity Learning via Dissimilarity Space in CBIR,” Proc. Eighth ACM Int'l Workshop Multimedia Information Retrieval (MIR '06), pp. 107-116, 2006.
[23] C.S. Ong, A.J. Smola, and R.C. Williamson, “Hyperkernels,” Proc. Advances in Neural Information Processing Systems (NIPS '03), vol. 15, 2003.
[24] P. Over, T. Ianeva, W. Kraaij, and A.F. Smeaton, “Trecvid 2005 an Overview,” Proc. TREC Video Retrieval Evaluation (TRECVID '05), Nat'l Inst. Standards and Tech nology, 2005.
[25] N.C. Oza, R. Polikar, J. Kittler, and F. Roli, “Multiple Classifier Systems,” LNCS, vol. 3541, Springer, 2005.
[26] E. Pekalska and R.P.W. Duin, “The Use of Dissimilarities for Object Recognition,” Proc. EOS Conf. Industrial Image and Machine Vision, pp. 50-53, 2005.
[27] E. Pekalska, P. Paclík, and R.P.W. Duin, “A Generalized Kernel Approach to Dissimilarity-Based Classification,” J. Machine Learning Research, vol. 2, pp. 175-211, Dec. 2001.
[28] P. Resnik, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy,” Proc. 14th Int'l Joint Conf. Artificial Intelligence (IJCAI '95), pp. 448-453, 1995.
[29] J.R. Smith, A. Jaimes, C.-Y. Lin, M. Naphade, A. Natsev, and B. Tseng, “Interactive Search Fusion Methods for Video Database Retrieval,” Proc. IEEE Int'l Conf. Image Processing (ICIP '03), 2003.
[30] K. Tieu and P. Viola, “Boosting Image Retrieval,” Proc. Int'l Conf. Computer Vision (ICCV '01), pp. 228-235, 2001.
[31] S. Tong and E. Chang, “Support Vector Machine Active Learning for Image Retrieval,” Proc. ACM Int'l Conf. Multimedia, pp. 107-118, 2001.
[32] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[33] P. Viola and M. Jones, “Robust Real-Time Face Detection,” Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[34] J.T. Wang, X. Wang, D. Shasha, and K. Zhang, “MetricMap: An Embedding Technique for Processing Distance-Based Queries in Metric Spaces,” IEEE Trans. Systems, Man, and Cybernetics B, vol. 35, no. 5, pp. 973-987, 2005.
[35] Y. Wu, E.Y. Chang, K.C.-C. Chang, and J.R. Smith, “Optimal Multimodal Fusion for Multimedia Data Analysis,” Proc. ACM Int'l Conf. Multimedia, 2004.
[36] H. Xiong, M.N.S. Swamy, and M.O. Ahmad, “Optimizing the Kernel in the Empirical Feature Space,” IEEE Trans. Neural Networks, vol. 16, no. 2, pp. 460-474, Mar. 2005.
[37] R. Yan, A. Hauptmann, and R. Jin, “Negative Pseudo-Relevance Feedback in Content-Based Video Retrieval,” Proc. ACM Multimedia (MM '03), 2003.
[38] J. Yang and A.G. Hauptmann, “Multi-Modality Analysis for Person Type Classification in News Video,” Proc. Electronic Imaging—Conf. Storage and Retrieval Methods and Applications for Multimedia, Jan. 2005.
[39] P. Zezula, P. Savino, G. Amato, and F. Rabitti, “Approximate Similarity Retrieval with m-Trees,” Very Large Data Bases J., vol. 7, no. 4, pp. 275-293, 1998.
[40] X.S. Zhou, A. Garg, and T.S. Huang, “A Discussion of Nonlinear Variants of Biased Discriminant for Interactive Image Retrieval,” Proc. Third Conf. Image and Video Retrieval (CIVR '04), pp. 353-364, 2004.
[41] X.S. Zhou and T.S. Huang, “Small Sample Learning During Multimedia Retrieval Using Biasmap,” Proc. IEEE Conf. Pattern Recognition and Computer Vision (CVPR '01), vol. I, pp. 11-17, 2004.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool