The Community for Technology Leaders
2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) (2018)
Xi'an
Sept. 13, 2018 to Sept. 16, 2018
ISBN: 978-1-5386-5322-7
pp: 1-7
Linghui Li , Chinese Academy of Sciences, Key lab of Intelligent Information Processing Institute of Computing Technology, University of the Chinese Academy of Sciences, Beijing, China
Sheng Tang , Chinese Academy of Sciences, Key lab of Intelligent Information Processing Institute of Computing Technology, Beijing, China
Junbo Guo , Chinese Academy of Sciences, Key lab of Intelligent Information Processing Institute of Computing Technology, Beijing, China
Rui Wang , China Academy of Electronics and Information Technology, Innovation Center, Beijing, China
Bo Lyu , China Academy of Electronics and Information Technology, Innovation Center, Beijing, China
Qi Tian , Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas
Yongdong Zhang , China Academy of Electronics and Information Technology, Innovation Center, Beijing, China
ABSTRACT
Recently, most of pioneering works based on supervised learning have been proposed for image captioning task. These approaches are heavily dependent on labeled training data. Through careful observation, we note that these approaches suffer from the problem of class imbalance (CIB) which can lead to performance degradation and limit the diversity of generated sentences. In this paper, to address this problem, we propose a pipeline based on an adaptive balancing loss (ABL) for image captioning which re-weighs loss of each category dynamically over the training process. Our proposed method can improve the accuracy and increase the diversity of generated descriptions through adaptively reducing losses of well-classified and frequent categories and increasing losses of under-classified and infrequent categories. We conduct experiments on the well-known MS CO-CO caption dataset to evaluate the performance of the proposed method. The results show that our approach achieves competitive performance compared to the state-of-the-art methods and can generate more accurate and diverse captions.
INDEX TERMS
image processing, learning (artificial intelligence), pattern classification, text analysis
CITATION

L. Li et al., "Image Captioning Based on Adaptive Balancing Loss," 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)(BIGMM), Xi'an, 2018, pp. 1-7.
doi:10.1109/BigMM.2018.8499066
92 ms
(Ver 3.3 (11022016))