This Article 
 Bibliographic References 
 Add to: 
Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins
July-September 2007 (vol. 4 no. 3)
pp. 441-446
An algorithm called Bidirectional Long Short-Term Memory Networks (BLSTM) for processing sequential data is introduced. This supervised learning method trains a special recurrent neural network to use very long ranged symmetric sequence context using a combination of nonlinear processing elements and linear feedback loops for storing long-range context. The algorithm is applied to the sequence-based prediction of protein localization and predicts 93.3% novel non-plant proteins and 88.4% novel plant proteins correctly, which is an improvement over feedforward and standard recurrent networks solving the same problem. The BLSTM system is available as a web-service (

[1] M. Reczko, E. Staub, P. Fiziev, and A. Hatzigeorgiou, “Finding Signal Peptides in Human Protein Sequences Using Recurrent Neural Networks,” Lecture Notes in Computer Science, R. Guigo and D. Gusfield, eds., vol. 2452, pp. 60-67, 2002.
[2] F. Gers and J. Schmidhuber, “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages,” IEEE Trans. Neural Networks, vol. 12, no. 6, pp. 1333-1340, 2001.
[3] G. Schatz and B. Dobberstein, “Common Principles of Protein Translocation across Membranes,” Science, vol. 271, no. 5255, pp.1519-1526, 1996.
[4] B. Eisenhaber and P. Bork, “Wanted: Subcellular Localization of Proteins Based on Sequence,” Trends Cell Biology, vol. 9, pp. 169-170, 1998.
[5] O. Emanuelsson and G. von Heijne, “Predicting of Organellar Targeting Signals,” Biochimica et Biophysica Acta, vol. 1541, pp. 114-119, 2001.
[6] K. Nakai, “Review: Prediction of in vivo Fates of Proteins in the Era of Genomics and Proteomics,” J. Structural Biology, vol. 134, pp. 103-116, 2001.
[7] K. Nakai, “Protein Sorting Signals and Prediction of Subcellular Localization,” Advances in Protein Chemistry, vol. 54, pp. 277-344, 2000.
[8] H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, “Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of Their Cleavage Sites,” Protein Eng., vol. 10, no. 1, pp. 1-6, 1997.
[9] H. Nielsen, S. Brunak, and G. von Heijne, “Machine Learning Approaches for the Prediction of Signal Peptides and Other Protein Sorting Signals,” Protein Eng., vol. 12, no. 1, pp. 3-9, 1999.
[10] M.G. Claros and P. Vincens, “Computational Method to Predict Mitochondrially Imported Proteins and Their Targeting Sequences,” European J. Biochemistry, vol. 241, pp. 779-786, 1996.
[11] O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” J. Molecular Biology, vol. 300, pp. 1005-1016, 2000.
[12] B. Jagla and J. Schuchhardt, “Adaptive Encoding Neural Networks for the Recognition of Human Signal Peptide Cleavage Sites,” Bioinformatics, vol. 16, pp. 245-250, 2000.
[13] A. Reinhardt and T. Hubbard, “Using Neural Networks for Prediction of the Subcellular Location of Proteins,” Nucleic Acids Research, vol. 26, no. 9, pp. 2230-2236, 1998.
[14] K.C. Chou, “Using Subsite Coupling to Predict Signal Peptides,” Protein Eng., vol. 14, pp. 75-79, 2001.
[15] S. Hua and Z. Sun, “Support Vector Machine Approach for Protein Subcellular Localization Prediction,” Bioinformatics, vol. 17, no. 8, pp. 721-728, 2001.
[16] E.M. Marcotte, I. Xenarios, A.M. van der Bliek, and D. Eisenberg, “Localizing Proteins in the Cell from Their Phylogenetic Profiles,” Proc. Nat'l Academy of Sciences USA, vol. 97, no. 22, pp. 12115-12120, 2000.
[17] R. Mott, J. Schultz, P. Bork, and C.P. Ponting, “Predicting Protein Cellular Localization Using a Domain Projection Method,” Genome Reserach, vol. 12, pp. 1168-1174, 2002.
[18] H. Bannai, Y. Tamada, O. Maruyama, K. Nakai, and S. Miyano, “Extensive Feature Detection of n-Terminal Protein Sorting Signals,” Bioinformatics, vol. 18, no. 2, pp. 298-305, 2002.
[19] A. Drawid and M. Gerstein, “A Bayesian System Integrating Expression Data with Sequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome,” J. Molecular Biology, vol. 301, pp. 1059-1075, 2000.
[20] M. Bhasin and G. Raghava, “ESLpred: SVM-Based Method for Subcellular Localization of Eukaryotic Proteins Using Dipeptide Composition and PSI-BLAST,” Nucleic Acids Research, vol. 32, pp.W414-W419, 2004.
[21] M. Reczko and A. Hatzigeorgiou, “Prediction of Subcellular Localization of Eukaryotic Proteins Using Sequence Signals and Composition,” PROTEOMICS, vol. 4, no. 6, pp. 1591-1596, 2004.
[22] J. Hawkins and M. Boden, “The Applicability of Recurrent Neural Networks for Biological Sequence Analysis,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 3, pp. 243-253, July-Sept. 2005.
[23] F. Gers et al., “Learning Precise Timing with LSTM Recurrent Networks,” J. Machine Learning Research, vol. 3, pp. 115-143, 2002.
[24] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[25] A.J. Robinson and F. Fallside, “The Utility Driven Dynamic Error Propagation Network,” Technical Report CUED/F-INFENG/TR.1, Eng. Dept., Cambridge Univ., 1987.
[26] M. Riedmiller and H. Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm,” Proc. IEEE Int'l Conf. Neural Networks (ICNN '93), H. Ruspini, ed., pp.586-591, 1993.
[27] M. Schuster and K. Paliwal, “Bidirectional Recurrent Neural Networks,” IEEE Trans. Signal Processing, vol. 45, pp. 2673-2681, 1997.
[28] P. Baldi, S. Brunak, Y. Chauvin, C.A.F. Andersen, and H. Nielsen, “Assessing the Accuracy of Prediction Algorithms for Classification: An Overview,” Bioinformatics, vol. 16, pp. 412-424, 2000.

Index Terms:
recurrent neural networks, long shortterm memory, biological sequence analysis, protein subcellular localization prediction
Trias Thireou, Martin Reczko, "Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 441-446, July-Sept. 2007, doi:10.1109/tcbb.2007.1015
Usage of this product signifies your acceptance of the Terms of Use.