The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - July-September (2010 vol.7)
pp: 472-480
Fabio Rinaldi , University of Zurich
Gerold Schneider , University of Zurich, Zurich
Kaarel Kaljurand , University of Zurich, Zurich
Simon Clematide , University of Zurich, Zurich
Thérèse Vachon , Novartis Pharma AG, NITAS, Text Mining Services, Basel
Martin Romacker , Novartis Pharma AG, NITAS, Text Mining Services, Basel
ABSTRACT
We describe a system for the detection of mentions of protein-protein interactions in the biomedical scientific literature. The original system was developed as a part of the OntoGene project, which focuses on using advanced computational linguistic techniques for text mining applications in the biomedical domain. In this paper, we focus in particular on the participation to the BioCreative II.5 challenge, where the OntoGene system achieved best-ranked results. Additionally, we describe a feature-analysis experiment performed after the challenge, which shows the unexpected result that one single feature alone performs better than the combination of features used in the challenge.
INDEX TERMS
Biomedical text mining, Natural Language Processing (NLP), protein interactions, BioCreative.
CITATION
Fabio Rinaldi, Gerold Schneider, Kaarel Kaljurand, Simon Clematide, Thérèse Vachon, Martin Romacker, "OntoGene in BioCreative II.5", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 3, pp. 472-480, July-September 2010, doi:10.1109/TCBB.2010.50
REFERENCES
[1] UniProt Consortium, "The Universal Protein Resource (Uniprot)," Nucleic Acids Research, vol. 35, pp. D193-D197, 2007.
[2] A. Zanzoni, L. Montecchi-Palazzi, M. Quondam, G. Ausiello, M. Helmer-Citterich, and G. Cesareni, "MINT: A Molecular INTeraction Database" FEBS Letters, vol. 513, no. 1, pp. 135-140, 2002.
[3] H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, S. Mudali, S. Kerrien, S. Orchard, M. Vingron, B. Roechert, P. Roepstorff, A. Valencia, H. Margalit, J. Armstrong, A. Bairoch, G. Cesareni, D. Sherman, and R. Apweiler, "IntAct: An Open Source Molecular Interaction Database," Nucleic Acids Research, vol. 32, suppl. 1, pp. D452-D455, 2004.
[4] S. Ananiadou, D.B. Kell, and J. Tsujii, "Text Mining and Its Potential Applications in Systems Biology," Trends in Biotechnology, vol. 24, pp. 571-579, Dec. 2006.
[5] Text Mining for Biology and Biomedicine, S. Ananiadou and J. McNaught, eds. Artech House Books, 2006.
[6] F. Rinaldi, "Ontogene: Text Mining for Biomedical Literature," http:/www.ontogene.org/, Apr. 2010.
[7] F. Rinaldi, T. Kappeler, K. Kaljurand, G. Schneider, M. Klenner, S. Clematide, M. Hess, J.-M. von Allmen, P. Parisot, M. Romacker, and T. Vachon, "OntoGene in BioCreative II," Genome Biology, vol. 9, suppl. 2, p. S13, 2008.
[8] F. Rinaldi, G. Schneider, K. Kaljurand, M. Hess, C. Andronis, O. Konstandi, and A. Persidis, "Mining of Functional Relations between Genes and Proteins over Biomedical Scientific Literature Using a Deep-Linguistic Approach," J. Artificial Intelligence in Medicine, vol. 39, pp. 127-136, 2007.
[9] G. Schneider, "Hybrid Long-Distance Functional Dependency Parsing," doctoral thesis, Inst. of Computational Linguistics, Univ. of Zurich, 2007.
[10] F. Rinaldi, G. Schneider, K. Kaljurand, M. Hess, and M. Romacker, "An Environment for Relation Mining over Richly Annotated Corpora: The Case of GENIA," BMC Bioinformatics, vol. 7, suppl. 3, p. S3, 2006.
[11] S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski, "BioInfer: A Corpus for Information Extraction in the Biomedical Domain," BMC Bioinformatics, vol. 8, p. 50, 2007.
[12] A.B. Clegg and A.J. Shepherd, "Benchmarking Natural-Language Parsers for Biological Applications Using Dependency Graphs," BMC Bioinformatics, vol. 8, p. 24, 2007.
[13] K. Fundel, R. Küffner, and R. Zimmer, "RelEx—Relation Extraction Using Dependency Parse Trees," Bioinformatics, vol. 23, no. 3 pp. 365-371, 2007.
[14] The UniProt Consortium, "The Universal Protein Resource (UniProt) in 2010," Nucleic Acids Research, vol. 38, suppl. 1, pp. D142-D148, 2010.
[15] NCBI, "The NCBI Taxonomy Homepage," http://www.ncbi.nlm. nih.govTaxonomy/, Apr. 2010.
[16] HUPO Proteomics Standards Initiative, "PSI-MI," http://psidev. sourceforge.net/mipsi-mi.obo , Apr. 2010.
[17] University of Michigan Medical School, "Cell Line Knowledge Base," http:/clkb.ncbi.org, Apr. 2010.
[18] K. Kaljurand, F. Rinaldi, T. Kappeler, and G. Schneider, "Using Existing Biomedical Resources to Detect and Ground Terms in Biomedical Literature," Proc. 12th Conf. Artificial Intelligence in Medicine (AIME '09), 2009.
[19] Alias-i, "Lingpipe," http://alias-i.comlingpipe, Apr. 2010.
[20] T. Kappeler, S. Clematide, K. Kaljurand, G. Schneider, and F. Rinaldi, "Towards Automatic Detection of Experimental Methods from Biomedical Literature," Proc. Third Int'l Symp. Semantic Mining in Biomedicine (SMBM), 2008.
[21] J. Hakenberg, "What's in a Gene Name? Automated Refinement of Gene Name Dictionaries.," Proc. Biological, Translational, and Clinical Language Processing (BioNLP '07), 2007.
[22] J. Hakenberg, C. Plake, L. Royer, H. Strobelt, U. Leser, and M. Schroeder, "Gene Mention Normalization and Interaction Extraction with Context Models and Sentence Motifs," Genome Biology, vol. 9, suppl. 2, p. S14, 2008.
[23] X. Wang and M. Matthews, "Distinguishing the Species of Biomedical Named Entities for Term Identification," BMC Bioinformatics, vol. 9, suppl. 11, p. S6, 2008.
[24] L. Tanabe and W.J. Wilbur, "Tagging Gene and Protein Names in Biomedical Text," Bioinformatics, vol. 18, no. 8, pp. 1124-1132, 2002.
[25] T. Kappeler, K. Kaljurand, and F. Rinaldi, "TX Task: Automatic Detection of Focus Organisms in Biomedical Publications," Proc. Biological, Translational, and Clinical Language Processing (BioNLP) Workshop, 2009.
[26] "BioCreAtIvE : Critical Assessment of Information Extraction in Biology," http:/www.biocreative.org, Apr. 2010.
[27] M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia, "Overview of the Protein-Protein Interaction Annotation Extraction Task of BioCreative II," Genome Biology, vol. 9, suppl. 2, p. S4, 2008.
[28] F. Leitner, M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.-J. Kuo, C.-N. Hsu, R.T.-H. Tsai, H.-C. Hung, W.W. Lau, C.A. Johnson, R. Saetre, K. Yoshida, Y.H. Chen, S. Kim, S.-Y. Shin, B.-T. Zhang, W.A. Baumgartner, L. Hunter, B. Haddow, M. Matthews, X. Wang, P. Ruch, F. Ehrler, A. Özgür, G. Erkan, D.R. Radev, M. Krauthammer, T. Luong, R. Hoffmann, C. Sander, and A. Valencia, "Introducing Meta-Services for Biomedical Information Extraction," Genome Biology, vol. 9, suppl. 2, p. S6, 2008.
[29] B. Alex, C. Grover, B. Haddow, M. Kabadjov, E. Klein, M. Matthews, R. Tobin, and X. Wang, "Automating Curation Using a Natural Language Processing Pipeline," Genome Biology, vol. 9, suppl. 2, p. S10, 2008.
[30] J.D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA Corpus— Semantically Annotated Corpus for Bio-Textmining," Bioinformatics, vol. 19, suppl. 1, pp. i180-i182, 2003.
[31] G. Schneider, K. Kaljurand, T. Kappeler, and F. Rinaldi, "Detecting Protein-Protein Interactions in Biomedical Texts Using a Parser and Linguistic Resources," Proc. Int'l Conf. Intelligent Text Processing and Computational Linguistics (CICLING '09), 2009.
[32] G. Schneider, K. Kaljurand, F. Rinaldi, and T. Kuhn, "Pro3Gres Parser in the CoNLL Domain Adaptation Shared Task," Proc. Conf. Computational Natural Language Learning (CoNLL) Shared Task Session of Empirical Methods on Natural Language Processing (EMNLP)-CoNLL 2007, pp. 1161-1165, 2007.
[33] K. Haverinen, F. Ginter, S. Pyysalo, and T. Salakoski, "Accurate Conversion of Dependency Parses: Targeting the Stanford Scheme," Proc. Third Int'l Symp. Semantic Mining in Biomedicine (SMBM '08), 2008.
[34] S. Kim, J. Yoon, and J. Yang, "Kernel Approaches for Genic Interaction Extraction," Bioinformatics, vol. 24, no. 1, pp. 118-126, 2008.
[35] S. Van Landeghem, Y. Saeys, and Y. Van de Peer, "Extracting Protein-Protein Interactions from Text Using Rich Feature Vectors and Feature Selection," Proc. Third Int'l Symp. Semantic Mining in Biomedicine (SMBM '08), 2008.
[36] G.K. Zipf, Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.
[37] M. Collins and J. Brooks, "Prepositional Attachment through a Backed-Off Model," Proc. Third Workshop Very Large Corpora, 1995.
[38] M. Collins, "Head-Driven Statistical Models for Natural Language Parsing," Computational Linguistics, vol. 29, pp. 589-637, 2003.
[39] S. Katz, "Estimation of Probabilities from Sparse Data for the Language Model Com-Ponent of a Speech Recogniser," IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 3, pp. 400-401, Mar. 1987.
[40] G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K.J. Miller, "Introduction to Wordnet: An Online Lexical ${\rm Database}^{\ast}$ ," Int'l J. Lexicography, vol. 3, no. 4, pp. 235-244, Jan. 1990.
[41] J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, "Overview of Bionlp '09 Shared Task on Event Extraction," Proc. Biological, Translational, and Clinical Language Processing (BioNLP '09) Workshop Companion Volume for Shared Task, pp. 1-9, 2009.
[42] J. Davis and M. Goadrich, "The Relationship between Precision-Recall and ROC Curves," Proc. 23rd Int'l Conf. Machine Learning (ICML '06), pp. 233-240, 2006.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool