This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
July-September 2010 (vol. 7 no. 3)
pp. 385-399
Florian Leitner, Spanish National Cancer Research Centre (CNIO), Madrid
Scott A. Mardis, MITRE Corporation, Bedford
Martin Krallinger, Spanish National Cancer Research Centre (CNIO), Madrid
Gianni Cesareni, University of Rome Tor Vergata, Rome
Lynette A. Hirschman, MITRE Corporation, Bedford
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Madrid
We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and (balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.

[1] R.B. Altman, C.M. Bergman, J. Blake, C. Blaschke, A. Cohen, F. Gannon, L. Grivell, U. Hahn, W. Hersh, L. Hirschman, L.J. Jensen, M. Krallinger, B. Mons, S.I. O'Donoghue, M.C. Peitsch, D. Rebholz-Schuhmann, H. Shatkay, and A. Valencia, "Text Mining for Biology—the Way Forward: Opinions from Leading Scientists," Genome Biology, vol. 9, suppl. 2, p. S7, 2008.
[2] C. Blaschke, L. Hirschman, A. Yeh, and A. Valencia, "Critical Assessment of Information Extraction Systems in Biology," Comparative and Functional Genomics, vol. 4, pp. 674-677, 2003.
[3] M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia, "Evaluation of Text-Mining Systems for Biology: Overview of the Second BioCreative Community Challenge," Genome Biology, vol. 9, suppl. 2, p. S1, 2008.
[4] L. Smith, L.K. Tanabe, R.J. Ando, C.J. Kuo, I.F. Chung, C.N. Hsu, Y.S. Lin, R. Klinger, C.M. Friedrich, K. Ganchev, M. Torii, H. Liu, B. Haddow, C.A. Struble, R.J. Povinelli, A. Vlachos, W.A. Baumgartner,Jr., L. Hunter, B. Carpenter, R.T. Tsai, H.J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, C. Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M. Mana-Lopez, J. Mata, and W.J. Wilbur, "Overview of BioCreative II Gene Mention Recognition," Genome Biology, vol. 9, suppl. 2, p. S2, 2008.
[5] A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman, "BioCreAtIvE Task 1A: Gene Mention Finding Evaluation," BMC Bioinformatics, vol. 6, suppl. 1, p. S2, 2005.
[6] "The Universal Protein Resource (UniProt) 2009," Nucleic Acids Research, vol. 37, pp. D169-D174, Jan. 2009.
[7] L. Hirschman, M. Colosimo, A. Morgan, and A. Yeh, "Overview of BioCreAtIvE Task 1B: Normalized Gene Lists," BMC Bioinformatics, vol. 6, suppl. 1, p. S11, 2005.
[8] A.A. Morgan, Z. Lu, X. Wang, A.M. Cohen, J. Fluck, P. Ruch, A. Divoli, K. Fundel, R. Leaman, J. Hakenberg, C. Sun, H.H. Liu, R. Torres, M. Krauthammer, W.W. Lau, H. Liu, C.N. Hsu, M. Schuemie, K.B. Cohen, and L. Hirschman, "Overview of BioCreative II Gene Normalization," Genome Biology, vol. 9, suppl. 2, p. S3, 2008.
[9] C. Blaschke, E.A. Leon, M. Krallinger, and A. Valencia, "Evaluation of BioCreAtIvE Assessment of Task 2," BMC Bioinformatics, vol. 6, suppl. 1, p. S16, 2005.
[10] A. Ceol, A. Chatr-Aryamontri, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, and G. Cesareni, "MINT, the Molecular Interaction Database: 2009 Update," Nucleic Acids Research, vol. 38, pp. D532-D539, Jan. 2010.
[11] A. Chatr-Aryamontri, S. Kerrien, J. Khadake, S. Orchard, A. Ceol, L. Licata, L. Castagnoli, S. Costa, C. Derow, R. Huntley, B. Aranda, C. Leroy, D. Thorneycroft, R. Apweiler, G. Cesareni, and H. Hermjakob, "MINT and IntAct Contribute to the Second BioCreative Challenge: Serving the Text-Mining Community with High Quality Molecular Interaction Data," Genome Biology, vol. 9, suppl. 2, p. S5, 2008.
[12] C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, "BioGRID: A General Repository for Interaction Datasets," Nucleic Acids Research, vol. 34, pp. D535-D539, Jan. 2006.
[13] M. Krallinger, F. Leitner, C. Rodriguez-Penagos, and A. Valencia, "Overview of the Protein-Protein Interaction Annotation Extraction Task of BioCreative II," Genome Biology, vol. 9, suppl. 2, p. S4, 2008.
[14] F. Leitner, M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.J. Kuo, C.N. Hsu, R.T. Tsai, H.C. Hung, W.W. Lau, C.A. Johnson, R. Saetre, K. Yoshida, Y.H. Chen, S. Kim, S.Y. Shin, B.T. Zhang, W.A. Baumgartner,Jr., L. Hunter, B. Haddow, M. Matthews, X. Wang, P. Ruch, F. Ehrler, A. Ozgur, G. Erkan, D.R. Radev, M. Krauthammer, T. Luong, R. Hoffmann, C. Sander, and A. Valencia, "Introducing Meta-Services for Biomedical Information Extraction," Genome Biology, vol. 9, suppl. 2, p. S6, 2008.
[15] A. Ceol, A. Chatr-Aryamontri, L. Licata, and G. Cesareni, "Linking Entries in Protein Interaction Database to Structured Text: The FEBS Letters Experiment," FEBS Letters, vol. 582, pp. 1171-1177, Apr. 2008.
[16] A. Chatr-Aryamontri, A. Ceol, L.M. Palazzi, G. Nardelli, M.V. Schneider, L. Castagnoli, and G. Cesareni, "MINT: The Molecular INTeraction Database," Nucleic Acids Research, vol. 35, pp. D572-D574, Jan. 2007.
[17] S. Orchard, L. Salwinski, S. Kerrien, L. Montecchi-Palazzi, M. Oesterheld, V. Stumpflen, A. Ceol, A. Chatr-Aryamontri, J. Armstrong, P. Woollard, J.J. Salama, S. Moore, J. Wojcik, G.D. Bader, M. Vidal, M.E. Cusick, M. Gerstein, A.C. Gavin, G. Superti-Furga, J. Greenblatt, J. Bader, P. Uetz, M. Tyers, P. Legrain, S. Fields, N. Mulder, M. Gilson, M. Niepmann, L. Burgoon, J. De Las Rivas, C. Prieto, V.M. Perreau, C. Hogue, H.W. Mewes, R. Apweiler, I. Xenarios, D. Eisenberg, G. Cesareni, and H. Hermjakob, "The Minimum Information Required for Reporting a Molecular Interaction Experiment (MIMIx)," Nature Biotechnology, vol. 25, pp. 894-898, Aug. 2007.
[18] D. Howe, M. Costanzo, P. Fey, T. Gojobori, L. Hannick, W. Hide, D.P. Hill, R. Kania, M. Schaeffer, S. St Pierre, S. Twigger, O. White, and S.Y. Rhee, "Big Data: The Future of Biocuration," Nature, vol. 455, pp. 47-50, Sept. 2008.
[19] A. Bairoch, B. Boeckmann, S. Ferro, and E. Gasteiger, "Swiss-Prot: Juggling between Evolution and Stability," Briefings in Bioinformatics, vol. 5, pp. 39-55, Mar. 2004.
[20] C.D. Manning, D.R. Prabhakar, and S. Hinrich, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[21] B.W. Matthews, "Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme," Biochimica et Biophysics Acta, vol. 405, pp. 442-451, Oct. 1975.
[22] F. Leitner, A. Chatr-Aryamontri, A. Ceol, M. Krallinger, L. Licata, S. Mardis, L. Hirschman, G. Cesareni, and A. Valencia, "Enriching Publications with Structured Digital Abstracts: The Human-Machine Experiment," accepted for publication in Nature Biotechnology, 2010.
[23] B. Carpenter, "LingPipe," http:/www.alias-i.com/, 2010.
[24] Y. Tsuruoka, Y. Tateishi, J.D. Kim, T. Ohta, J. McNaught, S. Ananiadou, and J. Tsujii, "GENIA Tagger: Developing a Robust Part-of-Speech Tagger for Biomedical Text," Proc. 10th Panhellenic Conf. Informatics, pp. 382-392, 2005.
[25] B. Settles, "ABNER: An Open Source Tool for Automatically Tagging Genes, Proteins and Other Entity Names in Text," Bioinformatics, vol. 21, pp. 3191-3192, July 2005.

Index Terms:
Text mining, text analysis, natural language processing, molecular biology, biological curation.
Citation:
Florian Leitner, Scott A. Mardis, Martin Krallinger, Gianni Cesareni, Lynette A. Hirschman, Alfonso Valencia, "An Overview of BioCreative II.5," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 385-399, July-Sept. 2010, doi:10.1109/TCBB.2010.61
Usage of this product signifies your acceptance of the Terms of Use.