The Community for Technology Leaders
RSS Icon
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1639-1648
I. B. Ozyurt , Dept. of Psychiatry, Univ. of California, San Diego, La Jolla, CA, USA
The accelerating increase in the biomedical literature makes keeping up with recent advances challenging for researchers thus making automatic extraction and discovery of knowledge from this vast literature a necessity. Building such systems requires automatic detection of lexico-semantic event structures governed by the syntactic and semantic constraints of human languages in sentences of biomedical texts. The lexico-semantic event structures in sentences are centered around the predicates and most semantic role labeling (SRL) approaches focus only on the arguments of verb predicates and neglect argument taking nouns which also convey information in a sentence. In this article, a noun argument structure (NAS) annotated corpus named BioNom and a SRL system to identify and classify these structures is introduced. Also, a genetic algorithm-based feature selection (GAFS) method is introduced and global inference is applied to significantly improve the performance of the NAS Bio SRL system.
Semantics, Support vector machines, Genetic algorithms, Biological cells, Syntactics, Natural language processing, Text mining,biomedical text mining, Natural language processing, semantic role labeling, nominalizations, genetic algorithms
I. B. Ozyurt, "Automatic Identification and Classification of Noun Argument Structures in Biomedical Literature", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1639-1648, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.111
[1] D. Gildea and D. Jurafsky, “Automatic Labeling of Semantic Roles,” Computational Linguistics, vol. 28, no. 3, pp. 245-288, 2002.
[2] S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J. Martin, and D. Jurafsky, “Support Vector Learning for Semantic Argument Classification,” Machine Learning, vol. 60, pp. 11-39, 2005.
[3] S. Harabagiu, C.A. Bejan, and P. Morarescu, “Shallow Semantics for Relation Extraction,” Proc. 19th Int'l Joint Conf. Artificial Intelligence, pp. 1061-1066, 2005.
[4] S. Narayanan and S. Harabagiu, “Question Answering Based on Semantic Structures,” Proc. 20th Int'l Conf. Computational Linguistics (COLING), 2004.
[5] D.R. Dowty, “Thematic Proto-Roles and Argument Selection,” Language, vol. 67, no. 3, pp. 547-619, 1991.
[6] C. Baker, C. Fillmore, and J. Lowe, “The Berkeley Framenet Project,” Proc. 36th Ann. Meeting of the Assoc. for Computational Linguistics and 17th Int'l Conf. Computational Linguistics (COLING-ACL '98), 1998.
[7] C.J. Fillmore, “The Case for Case,” Proc. Universals in Linguistic Theory, pp. 1-88, 1968.
[8] M. Palmer, P. Kingsbury, and D. Gildea, “The Proposition Bank: An Annotated Corpus of Semantic Roles,” Computational Linguistics, vol. 31, no. 1, pp. 71-106, 2005.
[9] K.P.C. Friedman and A. Rzhetsky, “Two Biomedical Sublanguages: A Description Based on the Theories of Zellig Harris,” J. Biomedical Informatics, vol. 35, pp. 222-235, 2002.
[10] W. Chou, R.T. Tsai, Y. Su, W. Ku, T. Sung, and W. Hsu, “A Semi-Automatic Method for Annotating a Biomedical Proposition Bank,” Proc. Workshop Frontiers in Linguistically Annotated Corpora, pp. 5-12, 2006.
[11] A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielinska, B. Young, and R. Grishman, “Annotating Noun Argument Structure for Nombank,” Proc. European Language Resources Assoc. (LREC), 2004.
[12] K.B. Cohen, M. Palmer, and L. Hunter, “Nominalization and Alternations in Biomedical Language,” PLoS One, vol. 3, no. 9, pp. e3158, 2008.
[13] H. Kilicoglu, M. Fiszman, G. Rosemblat, S. Marimpietri, and T.C. Rindflesch, “Arguments of Nominals in Semantic Interpretation of Biomedical Text,” Proc. Workshop Biomedical Natural Language Processing, pp. 46-54, 2010.
[14] S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, and T. Salakoski, “Bioinfer: A Corpus for Information Extraction in the Biomedical Domain,” BMC Bioinformatics, vol. 8, article 50, 2007.
[15] J.-D. Kim, T. Ohta, and J. Tsujii, “Corpus Annotation for Mining Biomedical Events from Literature,” BMC Bioinformatics, vol. 9, article 10, 2008.
[16] J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, “Overview of Bionlp '09 Shared Task on Event Extraction,” Proc. Workshop Current Trends in Biomedical Natural Language Processing: Shared Task, pp. 1-9, 2009.
[17] J.-D. Kim, S. Pyysalo, T. Ohta, R. Bossy, N. Nguyen, and J. Tsujii, “Overview of Bionlp Shared Task 2011,” Proc. BioNLP Shared Task Workshop, pp. 1-6, 2011.
[18] P. Thompson, S.A. Iqbal, J. McNaught, and S. Ananiadou, “Construction of an Annotated Corpus to Support Biomedical Information Extraction,” BMC Bioinformatics, vol. 10, article 349, 2009.
[19] E. Buyko, E. Beisswanger, and U. Hahn, “The Genereg Corpus for Gene Expression Regulation Events an Overview of the Corpus and Its In-Domain and Out-of-Domain Interoperability,” Proc. Seventh Int'l Conf. Language Resources and Evaluation (LREC '10), May 2010.
[20] R.T. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I.T. Yeh, W. Ku, T.-Y. Sung, and W.-L. Hsu, “BIOSMILE: A Semantic Role Labeling System for Biomedical Verbs Using a Maximum-Entropy Model with Automatically Generated Template Features,” BMC Bioinformatics, vol. 8, article 325, , Sept. 2007.
[21] Z.P. Jiang and H.T. Ng, “Semantic Role Labeling of Nombank: A Maximum Entropy Approach,” Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '06), pp. 138-145, 2006.
[22] C. Liu and H.T. Ng, “Learning Predictive Structures for Semantic Role Labeling of Nombank,” Proc. 45th Ann. Meeting of the Assoc. of Computational Linguistics, pp. 208-215, 2007.
[23] M. Gerber, J. Chai, and A. Meyers, “The Role of Implicit Argumentation in Nominal srl,” Proc. Human Language Technologies: The Ann. Conf. North Am. Chapter of the Assoc. for Computational Linguistics, pp. 146-154, 2009.
[24] V. Punyakanok, D. Roth, W. Yih, and D. Zimak, “Semantic Role Labeling via Integer Linear Programming Inference,” Proc. 20th Int'l Conf. Computational Linguistics (COLING '04), 2004.
[25] A. Meyers, “Annotation Guidelines for Nombank Noun Argument Structure for Propbank,” nombank-specs-2007.pdf , 2007.
[26] C. Macleod, R. Grishman, A. Meyers, L. Barrett, and R. Reeves, “Nomlex: A Lexicon of Nominalizations,” Proc. EURALEX '98, 1998.
[27] J.L. Fleiss, Statistical Methods for Rates and Proportions, second ed. John Wiley & Sons, 1981.
[28] G. Hripcsak and A.S. Rothschild, “Agreement, the F-Measure, and Reliability in Information Retrieval,” J Am. Medical Informatics Assoc., vol. 12, no. 3, pp. 296-298, 2005.
[29] J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence, second ed. MIT Press, 1992.
[30] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
[31] L.O. Hall, I.B. Ozyurt, and J.C. Bezdek, “Clustering with a Genetically Optimized Approach,” IEEE Trans. Evolutionary Computation, vol. 3, no. 2, pp. 103-112, July 1999.
[32] I.B. Ozyurt, “A Semantic Parser for Neuro-Degenerative Disease Knowledge Discovery,” Proc. 21st Int'l Florida Artificial Intelligence Research Soc. Conf., pp. 189-194, 2008.
[33] N. Xue and M. Palmer, “Calibrating Features for Semantic Role Labeling,” Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '04), 2004.
[34] M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth, “Using Predicate-Argument Structures for Information Extraction,” Proc. 41st Ann. Meeting on Assoc. for Computational Linguistics (ACL '03), 2003.
[35] O. Bodenreider, “The Unified Medical Language System (umls): Integrating Biomedical Terminology,” Nucleic Acids Research, vol. 32, pp. 267-270, 2004.
[36] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. 10th European Conf. Machine Learning, pp. 137-142. 1998,
[37] J.D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, “Genia Corpus-Semantically Annotated Corpus for Bio-Text Mining,” Bioinformatics, vol. 19, no. Suppl I, pp. i180-182, 2003.
[38] D. McClosky and E. Charniak, “Self-Training for Biomedical Parsing.” Proc. 46th Ann. Meeting of the Assoc. for Computational Linguistics - Human Language Technologies (ACL-HLT '08), pp. 101-104, 2008.
[39] C. Hsu, C.C. Chang, and C.J. Lin, “A Practical Guide to Support Vector Classification,” papers/ guideguide.pdf, 2003.
[40] D. Dahlmeier and H.T. Ng, “Domain Adaptation for Semantic Role Labeling in the Biomedical Domain,” Bioinformatics, vol. 26, no. 8, pp. 1098-1104, 2010.
[41] T. Wattarujeekrit, P.K. Shah, and N. Collier, “Pasbio: Predicate-Argument Structures for Event Extraction in Molecular Biology,” BMC Bioinformatics, vol. 5, article 155, 2004.
[42] P.K. Shah and P. Bork, “Lsat: Learning About Alternative Transcripts in Medline,” Bioinformatics, vol. 22, no. 7, pp. 857-865, Apr. 2006.
[43] S. Bethard, Z. Lu, J. Martin, and L. Hunter, “Semantic Role Labeling for Protein Transport Predicates,” BMC Bioinformatics, vol. 9, article 277, 2008.
[44] T. Barnickel, J. Weston, R. Collobert, H.-W. Mewes, and V. Stmpflen, “Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts,” PLoS ONE, vol. 4, no. 7, p. e6393, 2009.
[45] H. Vafaie and K. De Jong, “Genetic Algorithms as a Tool for Feature Selection in Machine Learning,” Proc. Fourth Int'l Conf. Tools with Artificial Intelligence (TAI '92), pp. 200-203, Nov. 1992.
[46] J. Yang and V. Honavar, “Feature Subset Selection Using a Genetic Algorithm,” IEEE Intelligent Systems and Their Applications, vol. 13, no. 2, pp. 44-49, Mar./Apr. 1998.
[47] H. Frohlich, O. Chapelle, and B. Scholkopf, “Feature Selection for Support Vector Machines by Means of Genetic Algorithm,” Proc. IEEE 15th Int'l Conf. Tools with Artificial Intelligence, pp. 142-148, Nov. 2003.
[48] C.-L. Huang and C.-J. Wang, “A Ga-Based Feature Selection and Parameters Optimization for Support Vector Machines,” Expert Systems with Applications, vol. 31, pp. 231-240, 2006.
72 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool