This Article 
 Bibliographic References 
 Add to: 
Learning Image-Text Associations
February 2009 (vol. 21 no. 2)
pp. 161-177
Tao Jiang, Nanyang Technological University, Singapore
Ah-Hwee Tan, Nanyang Technological University, Singapore
Web information fusion can be defined as the problem of collating and tracking information related to specific topics on the World Wide Web. Whereas most existing work on web information fusion has focused on text-based multidocument summarization, this paper concerns the topic of image and text association, a cornerstone of cross-media web information fusion. Specifically, we present two learning methods for discovering the underlying associations between images and texts based on small training data sets. The first method based on vague transformation measures the information similarity between the visual features and the textual features through a set of predefined domain-specific information categories. Another method uses a neural network to learn direct mapping between the visual and textual features by automatically and incrementally summarizing the associated features into a set of information templates. Despite their distinct approaches, our experimental results on a terrorist domain document set show that both methods are capable of learning associations between images and texts from a small training data set.

[1] D. Radev, “A Common Theory of Information Theory from Multiple Text Sources, Step One: Cross-Document Structure,” Proc. First ACL SIGdial Workshop Discourse and Dialogue, 2000.
[2] R. Barzilay, “Information Fusion for Multidocument Summarization: Paraphrasing and Generation,” PhD dissertation, 2003.
[3] H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall, P.H. Lewis, and N.R. Shadbolt, “Automatic Ontology-Based Knowledge Extraction from Web Documents,” IEEE Intelligent Systems, vol. 18, no. 1, pp.14-21, 2003.
[4] A. Ginige, D. Lowe, and J. Robertson, “Hypermedia Authoring,” IEEE Multimedia, vol. 2, no. 4, pp. 24-35, 1995.
[5] S.-F. Chang, R. Manmatha, and T.-S. Chua, “Combining Text and Audio-Visual Features in Video Indexing,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (ICASSP '05), pp. 1005-1008, 2005.
[6] D.W. Oard and B.J. Dorr, “A Survey of Multilingual Text Retrieval,” technical report, College Park, MD, USA, 1996.
[7] T. Mandl, “Vague Transformations in Information Retrieval,” Proc. Sixth Int'l Symp. für Informationswissenschaft (ISI '98), pp.312-325, 1998.
[8] T. Jiang and A.-H. Tan, “Discovering Image-Text Associations for Cross-Media Web Information Fusion,” Proc. Int'l Workshop Parallel Data Mining (PKDD/ECML '06), pp. 561-568, 2006.
[9] A.-H. Tan, G.A. Carpenter, and S. Grossberg, “Intelligence through Interaction: Towards a Unified Theory for Learning,” Proc. Int'l Symp. Neural Networks (ISNN '07), D. Liu et al., eds., vol. 4491, pp. 1098-1107, 2007.
[10] G.A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine,” Computer Vision, Graphics, and Image Processing, vol. 37, pp. 54-115, 1987.
[11] J. He, A.-H. Tan, and C.L. Tan, “On Machine Learning Methods for Chinese Document Categorization,” Applied Intelligence, vol. 18, no. 3, pp. 311-322, 2003.
[12] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Models,” Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '03), pp. 119-126, 2003.
[13] S. Little, J. Geurts, and J. Hunter, “Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing,” Proc Sixth European Conf. Research and Advanced Technology for Digital Libraries (ECDL '02), pp. 158-175, 2002.
[14] J. Geurts, S. Bocconi, J. van Ossenbruggen, and L. Hardman, “Towards Ontology-Driven Discourse: From Semantic Graphs to Multimedia Presentations,” Proc. Second Int'l Semantic Web Conf. (ISWC '03), pp. 597-612, 2003.
[15] J. Han, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2005.
[16] H.H. Yu and W.H. Wolf, “Scenic Classification Methods for Image and Video Databases,” Proc. SPIE, vol. 2606, no. 1, pp.363-371,, 1995.
[17] I.K. Sethi, I.L. Coman, and D. Stan, “Mining Association Rules between Low-Level Image Features and High-Level Concepts,” Proc. SPIE, vol. 4384, no. 1, pp. 279-290,, 2001.
[18] M. Blume and D.R. Ballard, “Image Annotation Based on Learning Vector Quantization and Localized Haar Wavelet Transform Features,” Proc. SPIE, vol. 3077, no. 1, pp. 181-190,, 1997.
[19] A. Mustafa and I.K. Sethi, “Creating Agents for Locating Images of Specific Categories,” Proc. SPIE, vol. 5304, no. 1, pp.170-178,, 2003.
[20] Q. Ding, Q. Ding, and W. Perrizo, “Association Rule Mining on Remotely Sensed Images Using P-Trees,” Proc. Sixth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD'02), pp. 66-79, 2002.
[21] J. Tesic, S. Newsam, and B.S. Manjunath, “Mining Image Datasets Using Perceptual Association Rules,” Proc. SIAM Sixth Workshop Mining Scientific and Eng. Datasets in conjunction with the Third SIAM Int'l Conf. (SDM '03), , May 2003.
[22] T. Kohonen, Self-Organizing Maps, T. Kohonen, M.R. Schroeder, and T.S. Huang, eds., Springer-Verlag New York, Inc., 2001.
[23] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), J.B. Bocca, M. Jarke, and C.Zaniolo,eds., pp. 487-499, 1994.
[24] A.M. Teredesai, M.A. Ahmad, J. Kanodia, and R.S. Gaborski, “Comma: A Framework for Integrated Multimedia Mining Using Multi-Relational Associations,” Knowledge and Information Systems, vol. 10, no. 2, pp. 135-162, 2006.
[25] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53-87, 2004.
[26] C. Djeraba, “Association and Content-Based Retrieval,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 1, pp. 118-135, Jan./Feb. 2003.
[27] K. Barnard and D. Forsyth, “Learning the Semantics of Words and Pictures,” Proc. Eighth Int'l Conf. Computer Vision (ICCV '01), vol. 2, pp. 408-415, 2001.
[28] P. Duygulu, K. Barnard, J.F.G. de Freitas, and D.A. Forsyth, “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,” Proc. Seventh European Conf. Computer Vision (ECCV '02), pp. 97-112, 2002.
[29] J. Li and J.Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, Sept. 2003.
[30] E.P. Xing, R. Yan, and A.G. Hauptmann, “Mining Associated Text and Images with Dual-Wing Harmoniums,” Proc. 21st Ann. Conf. Uncertainty in Artificial Intelligence (UAI '05), p. 633, 2005.
[31] P. Sheridan and J.P. Ballerini, “Experiments in Multilingual Information Retrieval Using the Spider System,” Proc. 19th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '96), pp. 58-65, 1996.
[32] P. Biebricher, N. Fuhr, G. Lustig, M. Schwantner, and G. Knorz, “The Automatic Indexing System Air/Phys—From Research to Applications,” Proc. 11th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '88), pp. 333-342, 1988.
[33] N. Tishby, F. Pereira, and W. Bialek, “The Information Bottleneck Method,” Proc. 37th Ann. Allerton Conf. Comm., Control andComputing, pp. 368-377, , 1999.
[34] H. Hsu, L.S. Kennedy, and S.-F. Chang, “Video Search Reranking via Information Bottleneck Principle,” Proc. 14th Ann. ACM Int'l Conf. Multimedia (MULTIMEDIA '06), pp. 35-44, 2006.
[35] G. Carpenter and S. Grossberg, Pattern Recognition by Self-Organizing Neural Networks. MIT Press, 1991.
[36] A.-H. Tan, “Adaptive Resonance Associative Map,” Neural Networks, vol. 8, no. 3, pp. 437-446, 1995.
[37] G.A. Carpenter and S. Grossberg, “ART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns,” Applied Optics, vol. 26, pp. 4919-4930, 1987.
[38] G.A. Carpenter, S. Grossberg, and D.B. Rosen, “ART 2-A: An Adaptive Resonance Algorithm for Rapid Category Learning and Recognition,” Neural Networks, vol. 4, no. 4, pp. 493-504, 1991.
[39] G.A. Carpenter, S. Grossberg, and D.B. Rosen, “Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System,” Neural Networks, vol. 4, no. 6, pp.759-771, 1991.
[40] W. Li, K.-L. Ong, and W.K. Ng, “Visual Terrain Analysis of High-Dimensional Datasets,” Proc Ninth European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '05), A. Jorge, L. Torgo, P.Brazdil, R. Camacho, and J. Gama, eds., vol. 3721, pp. 593-600, 2005.
[41] F. Chu, Y. Wang, and C. Zaniolo, “An Adaptive Learning Approach for Noisy Data Streams,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 351-354, 2004.
[42] D. Shen, Q. Yang, and Z. Chen, “Noise Reduction through Summarization for Web-Page Classification,” Information Processing and Management, vol. 43, no. 6, pp. 1735-1747, 2007.
[43] A. Tan, H. Ong, H. Pan, J. Ng, and Q. Li, “FOCI: A Personalized Web Intelligence System,” Proc. IJCAI Workshop Intelligent Techniques for Web Personalization (ITWP '01), pp. 14-19, Aug. 2001.
[44] A.-H. Tan, H.-L. Ong, H. Pan, J. Ng, and Q.-X. Li, “Towards Personalised Web Intelligence,” Knowledge and Information Systems, vol. 6, no. 5, pp. 595-616, 2004.
[45] E.W.M. Lee, Y.Y. Lee, C.P. Lim, and C.Y. Tang, “Application of a Noisy Data Classification Technique to Determine the Occurrence of Flashover in Compartment Fires,” Advanced Eng. Informatics, vol. 20, no. 2, pp. 213-222, 2006.
[46] A.M. Fard, H. Akbari, R. Mohammad, and T. Akbarzadeh, “Fuzzy Adaptive Resonance Theory for Content-Based Data Retrieval,” Proc. Third IEEE Int'l Conf. Innovations in Information Technology (IIT '06), pp. 1-5, Nov. 2006.
[47] S. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation,” Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition (CVPR'04), pp. 1002-1009, 2004.
[48] M. Sharma, “Performance Evaluation of Image Segmentation and Texture Extraction Methods in Scene Analysis,” master's thesis, 1998.
[49] P. Duygulu, O.C. Ozcanli, and N. Papernick, “Comparison of Feature Sets Using Multimedia Translation,” LNCS, 2869th ed., 2003.
[50] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. Series B (Methodological '77), vol. 39, no. 1, pp. 1-38, 1977.

Index Terms:
Data mining, multimedia data mining, image-text association mining.
Tao Jiang, Ah-Hwee Tan, "Learning Image-Text Associations," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 2, pp. 161-177, Feb. 2009, doi:10.1109/TKDE.2008.150
Usage of this product signifies your acceptance of the Terms of Use.