The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July-Aug. (2013 vol.10)
pp: 897-904
Lishuang Li , Coll. of Comput. Sci. & Technol., Dalian Univ. of Technol., Dalian, China
Wenting Fan , Coll. of Comput. Sci. & Technol., Dalian Univ. of Technol., Dalian, China
Degen Huang , Coll. of Comput. Sci. & Technol., Dalian Univ. of Technol., Dalian, China
ABSTRACT
Biomedical named entity recognition (Bio-NER) is a fundamental step in biomedical text mining. This paper presents a two-phase Bio-NER model targeting at JNLPBA task. Our two-phase method divides the task into two subtasks: named entity detection (NED) and named entity classification (NEC). The NED subtask is accomplished based on the two-layer stacking method in the first phase, where named entities (NEs) are distinguished from nonnamed-entities (NNEs) in biomedical literatures without identifying their types. Then six classifiers are constructed by four toolkits (CRF++, YamCha, maximum entropy, Mallet) with different training methods and integrated based on the two-layer stacking method. In the second phase for the NEC subtask, the multiagent strategy is introduced to determine the correct entity type for entities identified in the first phase. The experiment results show that the presented approach can achieve an F-score of 76.06 percent, which outperforms most of the state-of-the-art systems.
INDEX TERMS
text analysis, bioinformatics, classification, data mining, maximum entropy methods, medical computing, multi-agent systems, F-score, two-phase Bio-NER system, integrated classifier, multiagent strategy, biomedical named entity recognition system, biomedical text mining, two-phase Bio-NER model targeting, JNLPBA task, named entity detection subtask, NED subtask, named entity classification subtask, NEC subtask, two-layer stacking method, biomedical literature, CRF++ toolkit, YamCha toolkit, maximum entropy toolkit, Mallet toolkit, toolkit training method, correct entity type determination, Stacking, Biological system modeling, Training, Proteins, Hidden Markov models, RNA, Computational modeling, bioinformatics, Named entity recognition and classification, two-layer stacking method, multiagent
CITATION
Lishuang Li, Wenting Fan, Degen Huang, "A Two-Phase Bio-NER System Based on Integrated Classifiers and Multiagent Strategy", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 897-904, July-Aug. 2013, doi:10.1109/TCBB.2013.106
REFERENCES
[1] Z.H. Yang, H.F. Lin, and Y.P. Li, "Exploiting the Performance of Dictionary-Based Bio-Entity Name Recognition in Biomedical Literature," Computational Biology and Chemistry, vol. 32, pp. 287-291, 2008.
[2] L.S. Li, R.P. Zhou, and D.G. Huang, "Two-Phase Biomedical Named Entity Recognition Using CRFs," Computational Biology and Chemistry, vol. 33, pp. 334-338, 2009.
[3] H.C. Wang, T.J. Zhao, H.Y. Tan, and S. Zhang, "Biomedical Named Entity Recognition Based on Classifiers Ensemble," Int'l J. Computer Science and Application, vol. 5, no. 2, pp. 1-11, 2008.
[4] D. Hanisch, K. Fundel, H.T. Mevissen, R. Zimmer, and J. Fluck, "ProMiner: Rule-Based Protein and Gene Entity Recognition," BMC Bioinformatics, vol. 6, no. S1, article S14, 2005.
[5] G.D. Zhou and J. Su, "Exploring Deep Knowledge Resources in Biomedical Name Recognition," Proc. Joint Workshop Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 96-99, 2004.
[6] C. Lee, W.J. Hou, and H.H. Chen, "Annotating Multiple Types of Biomedical Entities: A Single Word Classification Approach," Proc. Joint Workshop Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 80-83, 2004.
[7] S.K. Saha, S. Sarkar, and P. Mitra, "Feature Selection Techniques for Maximum Entropy Based Biomedical Named Entity Recognition," J. Biomedical Informatics, vol. 42, pp. 905-911, 2009.
[8] J. Finkel, S. Dingare, H. Nguyen, M. Nissim, C. Manning, and G. Sinclair, "Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web," Proc. Joint Workshop Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 88-91, 2004.
[9] J.D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, "Introduction to the Bio-Entity Recognition Task at JNLPBA," Proc. Joint Workshop Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 70-75, 2004.
[10] Z.H. Yang, H.F. Lin, and Y.P. Li, "Exploiting the Contextual Cues for Bio-Entity Name Recognition in Biomedical Literature," J. Biomedical Informatics, vol. 41, pp. 580-587, 2008.
[11] Z.H. Liao and H.G. Wu, "Biomedical Named Entity Recognition Based on Skip-Chain CRFs," Proc. 2012 Int'l Conf. Industrial Control and Electronics Eng., pp. 1495-1498, 2012.
[12] Y. Sasaki, Y. Tsuruoka, J. McNaught, and S. Ananiadou, "How to Make the Most of NE Dictionaries in Statistical NER," Proc. Workshop Current Trends in Biomedical Natural Language Processing, pp. 63-70, 2008.
[13] L.S. Li, R.P. Zhou, D.G. Huang, and W.P. Liao, "Integrating Divergent Models for Gene Mention Tagging," Proc. Int'l Conf. Natural Language Processing and Knowledge Eng. (NLP-KE '09), pp. 1-7, 2009.
[14] G.D. Zhou, D. Shen, J. Zhang, J. Su, and S.H. Tan, "Recognition of Protein Gene Names from Text Using an Ensemble of Classifiers," BMC Bioinformatics, vol. 6, no. Suppl. 1, article S7, 2005.
[15] W.J. Hou and H.H. Chen, "Enhancing Performance of Protein and Gene Name Recognizers with Filtering and Integration Strategies," J. Biomedical Informatics, vol. 37, pp. 448-460, 2004.
[16] M. Torii, Z.Z. Hu, C.H. Wu, and H.F. Liu, "BioTagger-GM: A Gene/Protein Name Recognition System," J. Am. Medical Informatics Assoc., vol. 16, pp. 247-255, 2009.
[17] L.S. Li, W.T. Fan, D.G. Huang, Y.Z. Dang, and J. Sun, "Boosting Performance of Gene Mention Tagging System by Hybrid Methods," J. Biomedical Informatics, vol. 45, pp. 156-164, 2012.
[18] J. Ferber, Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence. Addison-Wesley, 1999.
[19] Y. Kubera, P. Mathien, and S. Picault, "Everything Can be Agent!" Proc. Ninth Int'l Conf. Autonomous Agents and Multi-Agent System, vol. 1, pp. 1547-1548, 2010.
[20] W. Davies and P. Edwards, "Distributed Learning: An Agent-Based Approach to Data-Mining," http://www.agent.ai/doc/upload/200403davi95_2.pdf , 1995.
[21] P. Brazdil, M. Gams, and S. Sian, "Learning in Distributed Systems and Multi-Agent Environments" http://webcache.googleusercontent.com/search?q=cache:fZPJaCk-56gJ:citeseerx.ist. psu.edu/ viewdoc download%3Fdoi%3D10.1.1.21.5704%26rep% 3Drep1%26type%3Dpdf+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-a, 1991.
[22] P. Stone and M. Veloso, "Multi-Agent Systems: A Survey from a Machine Learning Perspective," Autonomous Robots, vol. 8, pp. 345-383, 2000.
[23] A. Vlachos, "Tackling the BioCreative 2 Gene Mention Task with Conditional Random Fields and Syntactic Parsing," Proc. of the Second BioCreative Challenge Evaluation Workshop, pp. 85-87, 2007.
[24] L. Tanabe, N. Xie, L.H. Thom, W. Matten, and W.J. Wilbur, "GENETAG: A Tagged Corpus for Gene/Protein Named Entity Recognition," BMC Bioinformatics, vol. 6, no. suppl. 1, article S3, 2005.
[25] J.D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, "Introduction to the Bio-Entity Recognition Task at JNLPBA," Proc. the Joint Workshop Natural Language Processing in Biomedicine and Its Applications (JNLPBA '04), pp. 70-75, 2004.
[26] R. Florian, A. Ittycheriah, H.Y. Jing, and T. Zhang, "Named Entity Recognition through Classifier Combination," Proc. Seventh Conf. Natural Language Learning (CONLL '03), vol. 4, pp. 168-171, 2003.
[27] L. Yang and Y.H. Zhou, "Two-Phase Biomedical Named Entity Recognition Based on Semi-CRFs," Proc. IEEE Fifth Int'l Conf. Bio-Inspired Computing: Theories and Applications (BIC-TA '10), pp. 1061-1065, 2010.
[28] "ABNER: A Biomedical Named Entity Recognizer," http://pages.cs.wisc.edu/~bsettlesabner/, 2013.
[29] C.H. Sun, Y. Guan, X.L. Wang, and L. Lin, "Rich Features Based Conditional Random Fields for Biological Named Entities Recognition," Computers in Biology and Medicine, vol. 37, pp. 1327-1333, 2007.
[30] S.K. Saha, S. Narayan, S. Sarkar, and P. Mitra, "A Composite Kernel for Named Entity Recognition," Pattern Recognition Letters, vol. 3, pp. 1591-1597, 2010.
90 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool