The Community for Technology Leaders
2017 IEEE International Conference on Multimedia and Expo (ICME) (2017)
Hong Kong, Hong Kong
July 10, 2017 to July 14, 2017
ISSN: 1945-788X
ISBN: 978-1-5090-6068-9
pp: 829-834
Kuniaki Saito , The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan
Andrew Shin , The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan
Yoshitaka Ushiku , The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan
Tatsuya Harada , The University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, Japan
ABSTRACT
Visual question answering (VQA) tasks use two types of images: abstract (illustrations) and real. Domain-specific differences exist between the two types of images with respect to “objectness,” “texture,” and “color.” Therefore, achieving similar performance by applying methods developed for real images to abstract images, and vice versa, is difficult. This is a critical problem in VQA, because image features are crucial clues for correctly answering the questions about the images. However, an effective, domain-invariant method can provide insight into the high-level reasoning required for VQA. We thus propose a method called DualNet that demonstrates performance that is invariant to the differences in real and abstract scene domains. Experimental results show that DualNet outperforms state-of-the-art methods, especially for the abstract images category.
INDEX TERMS
Feature extraction, Visualization, Knowledge discovery, Data mining, Network architecture, Training, Image color analysis
CITATION

K. Saito, A. Shin, Y. Ushiku and T. Harada, "DualNet: Domain-invariant network for visual question answering," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, Hong Kong, 2017, pp. 829-834.
doi:10.1109/ICME.2017.8019436
198 ms
(Ver 3.3 (11022016))