2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Oct. 22, 2017 to Oct. 29, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICCV.2017.163
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its spatially-aware representation by combining its semantic word vector representation with its spatial representation derived from the convolutional maps of the fine-tuned network. The resulting spatially-aware representations are further used to cluster attributes into multiple groups to form spatially-aware concepts (e.g., the neckline concept might consist of attributes like v-neck, round-neck, etc). Finally, we decompose the visual-semantic embedding space into multiple concept-specific subspaces, which facilitates structured browsing and attribute-feedback product retrieval by exploiting multimodal linguistic regularities. We conducted extensive experiments on our newly collected Fashion200K dataset, and results on clustering quality evaluation and attribute-feedback product retrieval task demonstrate the effectiveness of our automatically discovered spatially-aware concepts.
clothing, data mining, image representation, image retrieval, learning (artificial intelligence), pattern clustering, text analysis, vectors
X. Han et al., "Automatic Spatially-Aware Fashion Concept Discovery," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2018, pp. 1472-1480.