The Community for Technology Leaders
Computer Vision, IEEE International Conference on (2005)
Beijing, China
Oct. 17, 2005 to Oct. 20, 2005
ISSN: 1550-5499
ISBN: 0-7695-2334-X
pp: 846-851
Ruofei Zhang , State University of New York at Binghamton
Zhongfei (Mark) Zhang , State University of New York at Binghamton
Mingjing Li , Microsoft Research Asia
Wei-Ying Ma , Microsoft Research Asia
Hong-Jiang Zhang , Microsoft Research Asia
ABSTRACT
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7,736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
INDEX TERMS
null
CITATION

Z. (. Zhang, H. Zhang, M. Li, W. Ma and R. Zhang, "A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva," Computer Vision, IEEE International Conference on(ICCV), Beijing, China, 2005, pp. 846-851.
doi:10.1109/ICCV.2005.16
83 ms
(Ver 3.3 (11022016))