Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06) (2006)
Hong Kong, China
Dec. 18, 2006 to Dec. 22, 2006
Margherita Berardi , Universita degli Studi di Bari, via Orabona, 4 - 70126 Bari - Italy
Donato Malerba , Universita degli Studi di Bari, via Orabona, 4 - 70126 Bari - Italy
Marcella Attimonelli , Universita degli Studi di Bari, via Orabona, 4 - 70126 Bari - Italy
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domainrelevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a realworld dataset composed by publications selected to support biologists in the annotation of the HmtDB database.
M. Berardi, M. Attimonelli and D. Malerba, "Mining Information Extraction Models for HmtDB annotation," Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)(ICDMW), Hong Kong, China, 2006, pp. 207-212.