|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yansong Feng, Mirella Lapata, "Automatic Caption Generation for News Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 4, pp. 797-812, April, 2013. | |||
| BibTex | x | ||
| @article{ 10.1109/TPAMI.2012.118, author = {Yansong Feng and Mirella Lapata}, title = {Automatic Caption Generation for News Images}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {35}, number = {4}, issn = {0162-8828}, year = {2013}, pages = {797-812}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.118}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Pattern Analysis and Machine Intelligence TI - Automatic Caption Generation for News Images IS - 4 SN - 0162-8828 SP797 EP812 EPD - 797-812 A1 - Yansong Feng, A1 - Mirella Lapata, PY - 2013 KW - Visualization KW - Humans KW - Databases KW - Vocabulary KW - Probabilistic logic KW - Data models KW - Noise measurement KW - topic models KW - Caption generation KW - image annotation KW - summarization VL - 35 JA - IEEE Transactions on Pattern Analysis and Machine Intelligence ER - | |||
Web Extra: View Supplemental Material (PDF)
This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired individuals to access pictorial information. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned and colocated with thematically related documents. Our model learns to create captions from a database of news articles, the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. The model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). Inspired by recent work in summarization, we propose extractive and abstractive surface realization models. Experimental results show that it is viable to generate captions that are pertinent to the specific content of an image and its associated article, while permitting creativity in the description. Indeed, the output of our abstractive model compares favorably to handwritten captions and is often superior to extractive methods.
Index Terms:
Visualization,Humans,Databases,Vocabulary,Probabilistic logic,Data models,Noise measurement,topic models,Caption generation,image annotation,summarization
Citation:
Yansong Feng, Mirella Lapata, "Automatic Caption Generation for News Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 4, pp. 797-812, April 2013, doi:10.1109/TPAMI.2012.118
Usage of this product signifies your acceptance of the Terms of Use.

