The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - July-September (2010 vol.7)
pp: 462-471
Karin Verspoor , University of Colorado Denver, Aurora
Christophe Roeder , University of Colorado Denver, Aurora
Helen L. Johnson , University of Colorado Denver, Aurora
K. Bretonnel Cohen , University of Colorado Denver, Aurora
William A. Baumgartner Jr. , University of Colorado Denver, Aurora
Lawrence E. Hunter , University of Colorado Denver, Aurora
ABSTRACT
We introduce a system developed for the BioCreative II.5 community evaluation of information extraction of proteins and protein interactions. The paper focuses primarily on the gene normalization task of recognizing protein mentions in text and mapping them to the appropriate database identifiers based on contextual clues. We outline a "“fuzzy” dictionary lookup approach to protein mention detection that matches regularized text to similarly regularized dictionary entries. We describe several different strategies for gene normalization that focus on species or organism mentions in the text, both globally throughout the document and locally in the immediate vicinity of a protein mention, and present the results of experimentation with a series of system variations that explore the effectiveness of the various normalization strategies, as well as the role of external knowledge sources. While our system was neither the best nor the worst performing system in the evaluation, the gene normalization strategies show promise and the system affords the opportunity to explore some of the variables affecting performance on the BCII.5 tasks.
INDEX TERMS
Biomedical natural language processing, information extraction, gene normalization, text mining.
CITATION
Karin Verspoor, Christophe Roeder, Helen L. Johnson, K. Bretonnel Cohen, William A. Baumgartner Jr., Lawrence E. Hunter, "Exploring Species-Based Strategies for Gene Normalization", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 3, pp. 462-471, July-September 2010, doi:10.1109/TCBB.2010.48
REFERENCES
[1] M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia, "Evaluation of Text-Mining Systems for Biology: Overview of the Second BioCreative Community Challenge," Genome Biology, vol. 9, suppl. 2, article no. S1, 2008.
[2] L. Hunter, Z. Lu, J. Firby, W.A. Baumgartner, Jr., H.L. Johnson, P.V. Ogren, and K.B. Cohen, "OpenDMAP: An Open-Source, Ontology-Driven Concept Analysis Engine, with Applications to Capturing Knowledge Regarding Protein Transport, Protein Interactions and Cell-Specific Gene Expression," BMC Bioinformatics, vol. 9, no. 78, 2008.
[3] W.A. Baumgartner,Jr., K.B. Cohen, and L. Hunter, "An Open-Source Framework for Large-Scale, Flexible Evaluation of Biomedical Text Mining Systems," J. Biomedical Discovery and Collaboration, vol. 3, no. 1, 2008.
[4] D. Ferrucci and A. Lally, "Building an Example Application with the Unstructured Information Management Architecture," IBM Systems J., vol. 43, no. 3, pp. 455-475, July 2004.
[5] A.A. Morgan et al., "Overview of BioCreative II Gene Normalization," Genome Biology, vol. 9, suppl. 2, article no. S3, 2008.
[6] W.A. Baumgartner,Jr., Z. Lu, H.L. Johnson, J.G. Caporaso, J. Paquette, A. Lindemann, E.K. White, O. Medvedeva, K.B. Cohen, and L. Hunter, "Concept Recognition for Extracting Protein Interaction Relations from Biomedical Text," Genome Biology, vol. 9, suppl. 2, article no. S9, 2008.
[7] T. Kappeler, K. Kaljurand, and F. Rinaldi, "TX Task: Automatic Detection of Focus Organisms in Biomedical Publications," Proc. BioNLP '09 Workshop, pp. 80-88, http://www.aclweb.org/anthologyW09-1310, June 2009.
[8] X. Wang, J. Tsujii, and S. Ananiadou, "Disambiguating the Species of Biomedical Named Entities Using Natural Language Parsers," Bioinformatics, vol. 26, no. 5, pp. 661-667, http://bioinformatics. oxfordjournals.org/ cgi/content/abstract/26/5661, 2010.
[9] H. Xu, J.-W. Fan, and C. Friedman, "Combining Multiple Evidence for Gene Symbol Disambiguation," Proc. Biological, Translational, and Clinical Language Processing, pp. 41-48, http://www.aclweb. org/anthology/ W/W07 W07-1006, June 2007.
[10] H. Xu, J.-W. Fan, G. Hripcsa, E.A. Mendon, M. Markatou, and C. Friedman, "Gene Symbol Disambiguation Using Knowledge-Based Profiles," Bioinformatics, vol. 23, no. 8, pp. 1015-1022, 2007.
[11] X. Wang and M. Matthews, "Species Disambiguation for Biomedical Term Identification," Proc. Current Trends in Biomedical Natural Language Processing (BioNLP '08), pp. 71-79, 2008.
[12] A. Schwartz and M. Hearst, "A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text," Proc. Pacific Symp. Biocomputing, vol. 8, pp. 451-462, 2003.
[13] K.B. Cohen, K. Verspoor, H.L. Johnson, C. Roeder, P.V. Ogren, W.A. Baumgartner,Jr., E. White, H. Tipney, and L. Hunter, "High-Precision Biological Event Extraction with a Concept Recognizer," Proc. BioNLP '09 Companion Volume: Shared Task on Entity Extraction, pp. 50-58, 2009.
[14] A. Ceol, A. Chatr-Aryamontri, L. Licata, and G. Cesareni, "Linking Entries in Protein Interaction Database to Structured Text: The FEBS Letters Experiment," FEBS Letters, vol. 582, no. 8, pp. 1171-1177, 2008.
[15] F. Leitner et al., "Introducing Meta-Services for Biomedical Information Extraction," Genome Biology, vol. 9, suppl. 2, article no. S6, 2008.
[16] D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, "ProMiner: Rule-Based Protein and Gene Entity Recognition," BMC Bioinformatics, vol. 6, suppl. 1, article no. S14, 2005.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool