Issue No. 03 - July-September (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.46
Rune Sætre , University of Tokyo, Tokyo
Kazuhiro Yoshida , University of Tokyo, Tokyo
Makoto Miwa , University of Tokyo, Tokyo
Takuya Matsuzaki , University of Tokyo, Tokyo
Yoshinobu Kano , University of Tokyo, Tokyo
Jun'ichi Tsujii , University of Tokyo, Tokyo and University of Manchester, Manchester
Currently, relation extraction (RE) and event extraction (EE) are the two main streams of biological information extraction. In 2009, the majority of these RE and EE research efforts were centered around the BioCreative II.5 Protein-Protein Interaction (PPI) challenge and the “BioNLP event extraction shared task.” Although these challenges took somewhat different approaches, they share the same ultimate goal of extracting bio-knowledge from the literature. This paper compares the two challenge task definitions, and presents a unified system that was successfully applied in both these and several other PPI extraction task settings. The AkaneRE system has three parts: A core engine for RE, a pool of modules for specific solutions, and a configuration language to adapt the system to different tasks. The core engine is based on machine learning, using either Support Vector Machines or Statistical Classifiers and features extracted from given training data. The specific modules solve tasks like sentence boundary detection, tokenization, stemming, part-of-speech tagging, parsing, named entity recognition, generation of potential relations, generation of machine learning features for each relation, and finally, assignment of confidence scores and ranking of candidate relations. With these components, the AkaneRE system produces state-of-the-art results, and the system is freely available for academic purposes at http://www-tsujii.is.s.u-tokyo.ac.jp/satre/akane/.
Text mining, machine learning, language parsing and understanding, bioinformatics (genome or protein) databases.
M. Miwa, Y. Kano, T. Matsuzaki, K. Yoshida, R. Sætre and J. Tsujii, "Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 442-453, 2010.