The Community for Technology Leaders
2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) (2018)
Xi'an
Sept. 13, 2018 to Sept. 16, 2018
ISBN: 978-1-5386-5322-7
pp: 1-5
Yuanen Zhou , Hefei University of Technology, Hefei, China
Zhenzhen Hu , Hefei University of Technology, Hefei, China
Ye Zhac , Hefei University of Technology, Hefei, China
Xueliang Liu , Hefei University of Technology, Hefei, China
Richang Hong , Hefei University of Technology, Hefei, China
ABSTRACT
Attention mechanism plays an important role in understanding images and demonstrates its effectiveness in generating natural language descriptions of images. In recent years, with the advance of deep neural networks, visual attention has been well exploited in the encoder-decoder neural network-based framework. On the one hand, existing study shows that guidance captions can help attend to relevant image regions and suppress unimportant ones during the image encoding stage, especially for cluttered images. On the other hand, visual attention has been well exploited during the decoding stage. Observing the naturally complementary property between them, we propose a two-side attention model which combines the attention mechanism seamlessly and associated with a coarse to fine attention mechanism. The original text-guided attention model operates on region-level image feature, which is lack of definite semantic information and causes unsatisfied attention visualization. We alleviate the problem by enabling attention to be calculated at object-level image feature, which helps to obtain performance improvement and more interpretable attention visualization. Experiments conducted on MSCOCO datasets demonstrate the consistent improvement on text-guided attention model for image captioning.
INDEX TERMS
data visualisation, decoding, encoding, feature extraction, image segmentation, natural language processing, neural nets, text analysis
CITATION

Y. Zhou, Z. Hu, Y. Zhac, X. Liu and R. Hong, "Enhanced Text-Guided Attention Model for Image Captioning," 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)(BIGMM), Xi'an, 2018, pp. 1-5.
doi:10.1109/BigMM.2018.8499172
78 ms
(Ver 3.3 (11022016))