This Article 
 Bibliographic References 
 Add to: 
MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction
January/February 2012 (vol. 9 no. 1)
pp. 311-315
Yun Xu, University of Science and Technology of China, Hefei and Anhui Province Key Laboratory of High Performance Computing, Hefei
Da Teng, University of Science and Technology of China, Hefei and Anhui Province Key Laboratory of High Performance Computing, Hefei
Yiming Lei, University of Science and Technology of China, Hefei and Anhui Province Key Laboratory of High Performance Computing, Hefei
The rapid growth of scientific literature calls for automatic and efficient ways to facilitate extracting experimental data on protein phosphorylation. Such information is of great value for biologists in studying cellular processes and diseases such as cancer and diabetes. Existing approaches like RLIMS-P are mainly rule based. The performance lays much reliance on the completeness of rules. We propose an SVM-based system known as MinePhos which outperforms RLIMS-P in both precision and recall of information extraction when tested on a set of articles randomly chosen from PubMed.

[1] P. Cohen, “The Origins of Protein Phosphorylation,” Nature Cell Biology, vol. 4, pp. E127-E130, 2002.
[2] L.J. Jensen, J. Saric, and P. Bork, “Literature Mining for the Biologist: From Information Retrieval to Biological Discovery,” Nature Rev. Genetics, vol. 7, pp. 119-129, 2006.
[3] A. Cohen and W. Hersh, “A Survey of Current Work in Biomedical Text Mining,” Briefings in Bioinformatics, vol. 6, pp. 57-71, 2005.
[4] S. Soderland and W. Lehnert, “Wrap Up: A Trainable Discourse Module for Information Extraction,” J. Artificial Intelligence Research, vol. 2, pp. 131-158, 1994.
[5] G. Demetriou and R. Gaizaskas, “Automatically Augmenting Terminological Lexicons from Untagged Text,” Proc. Second Int'l Conf. Language Resources and Evaluation, pp. 861-867, 2000.
[6] D.M. Bikel, R. Schwartz, and R.M. Weischedel, “NYMBLE: A High-Performance Learning Name-Finder,” Proc. Fifth Conf. Applied Natural Language Processing, Assoc. for Computational Linguistics, pp. 194-201, 1997.
[7] Y. Xu, Z.H. Wang, Y.M. Lei, Y.Z. Zhao, and Y. Xue, “MBA: A Literature Mining System for Extracting Biomedical Abbreviations,” BMC Bioinformatics, vol. 10, article 14, 2009.
[8] S. Mika and B. Rost, “Protein Names Precisely Peeled off Free Text,” Bioinformatics, vol. 20, pp. i242-i247, 2004.
[9] Z.Z. Hu, M. Narayanaswamy, K.E. Ravikumar, K. Vijay-Shanker, and C.H. Wu, “Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-Based System,” Bioinformatics, vol. 21, no. 11, pp. 2759-2765, 2005.
[10] R. Apweiler et al., “UniProt: Universal Protein Knowledgebase,” Nucleic Acids Research, vol. 32, pp. D115-D119, 2004.
[11] F. Diella, C.M. Gould, C. Chica, A. Via, and T.J. Gibson, “Phospho.ELM: A Database of Phosphorylation Sites-Update 2008,” Nucleic Acids Research, vol. 36, pp. 240-244, 2007.
[12] F. Diella et al., “Phospho.ELM: A Database of Experimentally Verified Phosphorylation Sites in Eukaryotic Proteins,” BMC Bioinformatics, vol. 5, article 79, 2004.
[13] A. Kreegipuu, N. Blom, and S. Brunak, “PhosphoBase, a Database of Phosphorylation Sites: Release 2.0,” Nucleic Acids Research, vol. 27, no. 1, pp. 237-239, 1999.
[14] P. Thomason and R. Kay, “Eukaryotic Signal Transduction via Histidine-Aspartate Phosphorelay,” J. Cell Science, vol. 113, pp. 3141-3150, 2000.
[15] T. Sekimizu, H.S. Park, and J. Tsujii, “Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts,” Proc. Genome Informatics Series Workshop, vol. 9, pp. 62-71, 1998.
[16] D. Proux, F. Rechenmann, and L. Julliard, “A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions,” Proc. Int'l Conf. Intelligent Systems Molecular Biology, pp. 362-373, 2002.
[17] E. Brill, “Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging,” Computational Linguistics, vol. 21, pp. 543-565, 1995.
[18] M. Torii, Z. Hu, M. Song, C. Wu, and H. Liu, “A Comparison Study on Algorithms of Detecting Long Forms for Short Forms in Biomedical Text,” BMC Bioinformatics, vol. 8, no. Suppl. 9, pp. 1-9, 2007.
[19] H. Liu, Z.Z. Hu, J. Zhang, and C. Wu, “BioThesaurus: A Web-Based Thesaurus of Protein and Gene Names,” Bioinformatics, vol. 22, pp. 103-105, 2006.
[20] Z.Z. kou, W.W. Cohen, and R.F. Murphy, “High-Recall Protein Entity Recognition Using a Dictionary,” Bioinformatics, vol. 21, pp. i266-i273, 2005.

Index Terms:
Phosphrylation, Phospho.ELM, SVM, literature mining.
Yun Xu, Da Teng, Yiming Lei, "MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 311-315, Jan.-Feb. 2012, doi:10.1109/TCBB.2011.85
Usage of this product signifies your acceptance of the Terms of Use.