This Article 
 Bibliographic References 
 Add to: 
Literature Extraction of Protein Functions Using Sentence Pattern Mining
August 2005 (vol. 17 no. 8)
pp. 1088-1098
With the rapid growth of articles of genomics research, it has become a challenge for biomedical researchers to access this ever-increasing quantity of information to understand the newest discovery of functions of proteins they are studying. To facilitate functional annotation of proteins by utilizing the huge amounts of biomedical literature and transforming the knowledge into easily accessible database formats, the text mining technique thus becomes essential. In this paper, we propose the method of sentence pattern mining to extract protein functions from biomedical literature. To recognize variants of function terms correctly, we identify morphological, syntactic, and semantic variation forms. The proposed methods can be used to aid database curators in annotating protein functions and to assist biologists and medical researchers in searching protein functions from biomedical literature.

[1] A. Bairoch, B. Boeckmann, S. Ferro, and E. Gasteiger, “Swiss-Prot: Juggling between Evolution and Stability,” Briefings in Bioinformatics, vol. 5, no. 1, pp. 39-55, Mar. 2004.
[2] E. Camon, D. Barrell, V. Lee, E. Dimmer, and R. Apweiler, “The Gene Ontology Annotation (GOA) Database— An Integrated Resource of GO Annotations to the UniProt Knowledgebase,” Silico Biology, vol. 4, no. 1, pp. 5-6, 2003.
[3] J.-H. Chiang and H.-C. Yu, “MeKE: Discovering the Functions of Gene Products from Biomedical Literature via Sentence Alignment,” Bioinformatics, vol. 19, no. 11, pp. 1417-1422, 2003.
[4] J.-H. Chiang, H.-C. Yu, and H.-J. Hsu, “GIS: A Biomedical Text-Mining System for Gene Information Discovery,” Bioinformatics, vol. 20, no. 1, pp. 120-121, 2004.
[5] N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, A. Nikitin, and I. Mazo, “Extracting Human Protein Interactions from MEDLINE Using a Full-Sentence Parser,” Bioinformatics, vol. 20, no. 5, pp. 604-611, 2004.
[6] Grok, http:/, 2005.
[7] W. Hersh and R.T. Bhupatiraju, “TREC Genomics Track Overview,” Proc. 12th Text Retrieval Conf. (TREC 2003), 2003, .
[8] L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu, “Accomplishments and Challenges in Literature Data Mining for Biology,” Bioinformatics, vol. 18, no. 12, pp. 1553-1561, 2002.
[9] C. Jacquemin, Spotting and Discovering Terms through NLP. Cambridge, Mass.: MIT Press, 2001.
[10] G. Leroy and H. Chen, “Filling Preposition-Based Templates to Capture Information from Medical Abstracts,” Proc. Pacific Symp. Biocomputing (PSB) 2002, pp. 350-361, 2002.
[11] A.T. McCray, S. Srinivasan, and A.C. Browne, “Lexical Methods for Managing Variation in Biomedical Terminologies,” Proc. 18th Symp. Computer Applications in Medical Care (SCAMC '94), pp. 235-239, 1994, http:/
[12] C. Perez-Iratxeta, P. Bork, M.A. Andrade, “Exploring MEDLINE Abstracts with XplorMed,” Drugs Today, vol. 38, no. 6, pp. 381-389, 2002.
[13] J. Pustejovsky, J. Castaño, J. Zhang, M. Kotecki, and B. Cochran, “Robust Relational Parsing over Biomedical Literature: Extracting Inhibit Relations,” Proc. Pacific Symp. Biocomputing (PSB) 2002, pp. 362-373, 2002.
[14] S. Raychaudhuri, J.T. Chang, F. Imam, and R.B. Altman, “The Computational Analysis of Scientific Literature to Define and Recognize Gene Expression Clusters,” Nucleic Acids Research, vol. 31, no. 15, pp. 4553-4560, 2003.
[15] M. Sipser, Introduction to the Theory of Computation. Boston: PWS, 1997.
[16] B.J. Stapley, L.A. Kelley, and M.J.E. Sternberg, “Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines,” Proc. Pacific Symp. Biocomputing (PSB) 2002, pp. 374-385, 2002.
[17] L. Tanabe and W.J. Wilbur, “Tagging Gene and Protein Names in Biomedical Text,” Bioinformatics, vol. 18, no. 8, pp. 1124-1132, 2002.
[18] J.M. Temkin and M.R. Gilder, “Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar,” Bioinformatics, vol. 19, no. 16, pp. 2046-2053, 2003.
[19] The Gene Ontology Consortium, “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, vol. 32, pp. D258-D261, 2004, http:/
[20] W.J. Wilbur, “A Thematic Analysis of the AIDS Literature,” Proc. Pacific Symp. Biocomputing (PSB) 2002, pp. 386-397, 2002.
[21] D.-M. Yao, J.-B. Wang, Y.-M. Lu, N. Noble, H.-D. Sun, X.-Y. Zhu, N. Lin, D.G. Payan, M. Li, and K.-B. Qu, “PathwayFinder: Paving the Way towards Automatic Pathway Extraction,” Proc. Second Asia-Pacific Bioinformatics Conf. (APBC2004), pp. 53-62, 2004.
[22] A. Yeh, L. Hirschman, and A. Morgan, “Background and Overview for KDD Cup 2002 Task 1: Information Extraction from Biomedical Articles,” SIGKDD Explorations, vol. 4, no. 2, pp. 87-89, 2002.
[23] H. Yu and E. Agichtein, “Extracting Synonymous Gene and Protein Terms from Biological Literature,” Bioinformatics, vol. 19, Suppl. 1, pp. i340-i349, 2003.
[24] G.-D. Zhou, J. Zhang, J. Su, D. Shen, and C.-L. Tan, “Recognizing Names in Biomedical Texts: A Machine Learning Approach,” Bioinformatics, vol. 20, no. 7, pp. 1178-1190, 2004.

Index Terms:
Index Terms- Text mining, bioinformatics, knowledge acquisition, linguistic processing.
Jung-Hsien Chiang, Hsu-Chun Yu, "Literature Extraction of Protein Functions Using Sentence Pattern Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 8, pp. 1088-1098, Aug. 2005, doi:10.1109/TKDE.2005.132
Usage of this product signifies your acceptance of the Terms of Use.