The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.9)
pp: 311-315
Yun Xu , Anhui Province Key Lab. of High Performance Comput., Univ. of Sci. & Technol. of China, Hefei, China
Da Teng , Anhui Province Key Lab. of High Performance Comput., Univ. of Sci. & Technol. of China, Hefei, China
Yiming Lei , Anhui Province Key Lab. of High Performance Comput., Univ. of Sci. & Technol. of China, Hefei, China
ABSTRACT
The rapid growth of scientific literature calls for automatic and efficient ways to facilitate extracting experimental data on protein phosphorylation. Such information is of great value for biologists in studying cellular processes and diseases such as cancer and diabetes. Existing approaches like RLIMS-P are mainly rule based. The performance lays much reliance on the completeness of rules. We propose an SVM-based system known as MinePhos which outperforms RLIMS-P in both precision and recall of information extraction when tested on a set of articles randomly chosen from PubMed.
INDEX TERMS
Proteins, Substrates, Data mining, Dictionaries, Abstracts, Databases, Bioinformatics,literature mining., Phosphrylation, Phospho.ELM, SVM
CITATION
Yun Xu, Da Teng, Yiming Lei, "MinePhos: A Literature Mining System for Protein Phoshphorylation Information Extraction", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 311-315, January/February 2012, doi:10.1109/TCBB.2011.85
REFERENCES
[1] P. Cohen, “The Origins of Protein Phosphorylation,” Nature Cell Biology, vol. 4, pp. E127-E130, 2002.
[2] L.J. Jensen, J. Saric, and P. Bork, “Literature Mining for the Biologist: From Information Retrieval to Biological Discovery,” Nature Rev. Genetics, vol. 7, pp. 119-129, 2006.
[3] A. Cohen and W. Hersh, “A Survey of Current Work in Biomedical Text Mining,” Briefings in Bioinformatics, vol. 6, pp. 57-71, 2005.
[4] S. Soderland and W. Lehnert, “Wrap Up: A Trainable Discourse Module for Information Extraction,” J. Artificial Intelligence Research, vol. 2, pp. 131-158, 1994.
[5] G. Demetriou and R. Gaizaskas, “Automatically Augmenting Terminological Lexicons from Untagged Text,” Proc. Second Int'l Conf. Language Resources and Evaluation, pp. 861-867, 2000.
[6] D.M. Bikel, R. Schwartz, and R.M. Weischedel, “NYMBLE: A High-Performance Learning Name-Finder,” Proc. Fifth Conf. Applied Natural Language Processing, Assoc. for Computational Linguistics, pp. 194-201, 1997.
[7] Y. Xu, Z.H. Wang, Y.M. Lei, Y.Z. Zhao, and Y. Xue, “MBA: A Literature Mining System for Extracting Biomedical Abbreviations,” BMC Bioinformatics, vol. 10, article 14, 2009.
[8] S. Mika and B. Rost, “Protein Names Precisely Peeled off Free Text,” Bioinformatics, vol. 20, pp. i242-i247, 2004.
[9] Z.Z. Hu, M. Narayanaswamy, K.E. Ravikumar, K. Vijay-Shanker, and C.H. Wu, “Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-Based System,” Bioinformatics, vol. 21, no. 11, pp. 2759-2765, 2005.
[10] R. Apweiler et al., “UniProt: Universal Protein Knowledgebase,” Nucleic Acids Research, vol. 32, pp. D115-D119, 2004.
[11] F. Diella, C.M. Gould, C. Chica, A. Via, and T.J. Gibson, “Phospho.ELM: A Database of Phosphorylation Sites-Update 2008,” Nucleic Acids Research, vol. 36, pp. 240-244, 2007.
[12] F. Diella et al., “Phospho.ELM: A Database of Experimentally Verified Phosphorylation Sites in Eukaryotic Proteins,” BMC Bioinformatics, vol. 5, article 79, 2004.
[13] A. Kreegipuu, N. Blom, and S. Brunak, “PhosphoBase, a Database of Phosphorylation Sites: Release 2.0,” Nucleic Acids Research, vol. 27, no. 1, pp. 237-239, 1999.
[14] P. Thomason and R. Kay, “Eukaryotic Signal Transduction via Histidine-Aspartate Phosphorelay,” J. Cell Science, vol. 113, pp. 3141-3150, 2000.
[15] T. Sekimizu, H.S. Park, and J. Tsujii, “Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts,” Proc. Genome Informatics Series Workshop, vol. 9, pp. 62-71, 1998.
[16] D. Proux, F. Rechenmann, and L. Julliard, “A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions,” Proc. Int'l Conf. Intelligent Systems Molecular Biology, pp. 362-373, 2002.
[17] E. Brill, “Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging,” Computational Linguistics, vol. 21, pp. 543-565, 1995.
[18] M. Torii, Z. Hu, M. Song, C. Wu, and H. Liu, “A Comparison Study on Algorithms of Detecting Long Forms for Short Forms in Biomedical Text,” BMC Bioinformatics, vol. 8, no. Suppl. 9, pp. 1-9, 2007.
[19] H. Liu, Z.Z. Hu, J. Zhang, and C. Wu, “BioThesaurus: A Web-Based Thesaurus of Protein and Gene Names,” Bioinformatics, vol. 22, pp. 103-105, 2006.
[20] Z.Z. kou, W.W. Cohen, and R.F. Murphy, “High-Recall Protein Entity Recognition Using a Dictionary,” Bioinformatics, vol. 21, pp. i266-i273, 2005.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool