2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
Xiaoyan Wang , School of Computer, Central China Normal University, Wuhan, 430079, China
Xingpeng Jiang , School of Computer, Central China Normal University, Wuhan, 430079, China
Mengwen Liu , College of Computing & Informatics, Drexel University, Philadelphia, PA, 19104, USA
Tingting He , School of Computer, Central China Normal University, Wuhan, 430079, China
Xiaohua Hu , School of Computer, Central China Normal University, Wuhan, 430079, China
There are intensive computational efforts to discover large-scale microbial interactions from metagenomic abundance data, however, it is often difficult to validate such inferred interactions without a manually curated dataset. There are also a number of small-scale microbial interactions reported in massive literature with experimental confidence. Text mining can be employed to extract such microbial interactions from biomedical literature which could be a significant complement to abundance-based method. The key tasks of text mining include named entity recognition and relation extraction. Named entity recognition identifies the name of the specified type from the text. We manually annotated a corpus with 1344 abstracts from microbial literature for the task of bacterial named entity recognition. Six new features were added in addition to the general features of the biomedical field. Based on a bacterial dictionary and conditional random field (CRF), the bacterial named entity recognition model was trained and it achieved a performance with precision 89.118%, recall 81.598 % and F-measure 85.192%. The system and template are available at https://github.com/bluelilywxy/BacNER-V1.0.git.
Microorganisms, Feature extraction, Vocabulary, Hidden Markov models, Biological system modeling, Training
X. Wang, X. Jiang, M. Liu, T. He and X. Hu, "Bacterial named entity recognition based on dictionary and conditional random field," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 439-444.