Issue No. 11 - Nov. (2015 vol. 26)
Kenli Li , College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
Wei Ai , College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
Zhuo Tang , College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
Fan Zhang , Kavli Institute for Astrophysics and Space Research, Massachusetts Institute of Technology, Cambridge, MA
Lingang Jiang , College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
Keqin Li , Department of Computer Science, State University of New York, New Paltz, New York
Kai Hwang , Department of Electrical Engineering, University of Southern California, Los Angeles, CA
Processing large volumes of data has presented a challenging issue, particularly in data-redundant systems. As one of the most recognized models, the conditional random fields (CRF) model has been widely applied in biomedical named entity recognition (Bio-NER). Due to the internally sequential feature, performance improvement of the CRF model is nontrivial, which requires new parallelized solutions. By combining and parallelizing the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and Viterbi algorithms, we propose a parallel CRF algorithm called MapReduce CRF (MRCRF) in this paper, which contains two parallel sub-algorithms to handle two time-consuming steps of the CRF model. The MapReduce L-BFGS (MRLB) algorithm leverages the MapReduce framework to enhance the capability of estimating parameters. Furthermore, the MapReduce Viterbi (MRVtb) algorithm infers the most likely state sequence by extending the Viterbi algorithm with another MapReduce job. Experimental results show that the MRCRF algorithm outperforms other competing methods by exhibiting significant performance improvement in terms of time efficiency as well as preserving a guaranteed level of correctness.
Viterbi algorithm, Training, Biological system modeling, Inference algorithms, Hidden Markov models, Training data, Vectors
K. Li et al., "Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields," in IEEE Transactions on Parallel & Distributed Systems, vol. 26, no. 11, pp. 3040-3051, 2015.