2010 IEEE International Conference on Data Mining (2010)
Dec. 13, 2010 to Dec. 17, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2010.150
Given a large collection of images, very few of which have labels, how can we guess the labels of the remaining majority, and how can we spot those images that need brand new labels, different from the existing ones? Current automatic labeling techniques usually scale super linearly with the data size, and/or they fail when only a tiny amount of labeled data is provided. In this paper, we propose QMAS (Querying, Mining And Summarization of Multi-modal Databases), a fast solution to the following problems: (i) low-labor labeling (L3) – given a collection of images, very few of which are labeled with keywords, find the most suitable labels for the remaining ones, and (ii) mining and attention routing – in the same setting, find clusters, the top-NO outlier images, and the top-NR representative images. We report experiments on real satellite images, two large sets (1.5GB and 2.25GB) of proprietary images and a smaller set (17MB) of public images. We show that QMAS scales linearly with the data size, being up to 40 times faster than top competitors (GCap), obtaining better or equal accuracy. In contrast to other methods, QMAS does low-labor labeling (L3), that is, it works even with tiny initial label sets. It also solves both presented problems and spots tiles that potentially require new labels.
Automatic labeling, Clustering, Summarization and Multi-modal databases
F. Guo et al., "QMAS: Querying, Mining and Summarization of Multi-modal Databases," 2010 IEEE International Conference on Data Mining(ICDM), Sydney, Australia, 2010, pp. 785-790.