Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) (2006)
Hong Kong, China
Dec. 18, 2006 to Dec. 22, 2006
Brigitte Mathiak , TU Braunschweig, Germany
Andreas Kupfer , TU Braunschweig, Germany
Tatjana Scope , TU Braunschweig, Germany
Britta Stormann , TU Braunschweig, Germany
Silke Eckstein , TU Braunschweig, Germany
The aim of literature retrieval is to find significant papers on a given topic. In previous publications, we examined the use of choosing these papers based on the pictures they include. To refine this approach we seek to employ picture classification to further narrow down the number of interesting pictures presented. This can be useful, for example, when looking for the results of specific experiments. The classification can also be useful as a data cleansing step, to omit all unnecessary pictures not used as a figure. We use a method originally designed to distinguish between photos and computer-generated pictures on the web. We show that this method can not only be used to distinguish between raw data and derived representation figures, we can also reliably eliminate non-figure pictures in the document, like text pages and logos. We tested this approach on two different data sets with different topics and different non-figure problems, both with satisfactory results.
A. Kupfer, S. Eckstein, B. Mathiak, T. Scope and B. Stormann, "Using image classification for biomedical literature retrieval," Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)(ICDMW), Hong Kong, China, 2006, pp. 185-189.