This Article 
 Bibliographic References 
 Add to: 
The Applicability of Recurrent Neural Networks for Biological Sequence Analysis
July-September 2005 (vol. 2 no. 3)
pp. 243-253
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.

[1] S.F. Altschul, W. Gish, W. Miller, E.W. Meyers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[2] T. Bailey, M.E. Baker, C.P. Elkan, and W.N. Grundy, “MEME, MAST, and Meta-MEME: New Tools for Motif Discovery in Protein Sequences,” Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications, J.T.L. Wang, B.A. Shapiro, and D. Shasha, eds., pp. 30-54, Oxford Univ. Press, 1999.
[3] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, second ed. Cambridge, Mass.: MIT Press, 2001.
[4] P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri, “Exploiting the Past and the Future in Protein Secondary Structure Prediction,” Bioinformatics, vol. 15, pp. 937-946, 1999.
[5] M. Christiansen and N. Chater, “Toward a Connectionist Model of Recursion in Human Linguistic Performance,” Cognitive Science, vol. 23, pp. 157-205, 1999.
[6] J.L. Elman, “Finding Structure in Time,” Cognitive Science, vol. 14, pp. 179-211, 1990.
[7] O. Emanuelsson, “Predicting Protein Subcellular Localisation from Amino Acid Sequence Information,” Briefings in Bioinformatics, vol. 3, no. 4, pp. 361-376, 2002.
[8] O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” J. Molecular Biology, vol. 300, no. 4, pp. 1005-1016, 2000.
[9] B. Hammer and P. Tino, “Recurrent Neural Networks with Small Weights Implement Definite Memory Machines,” Neural Computation, vol. 15, no. 8, pp. 1897-1929, 2003.
[10] R. Janulczyk and M. Rasmussen, “Improved Pattern for Genome-Based Screening Identifies Novel Cell Wall-Attached Proteins in Gram-Positive Bacteria,” Infection and Immunity, vol. 69, no. 6, pp. 4019-4026, 2001.
[11] L. Kall, A. Krogh, and E.L. L. Sonnhammer, “A Combined Transmembrane Topology and Signal Peptide Prediction Method,” J. Molecular Biology, vol. 338, no. 5, pp. 1027-1036, 2004.
[12] J.F. Kolen, “Recurrent Networks: State Machines or Iterated Function Systems?” Proc. 1993 Connectionist Models Summer School, pp. 203-210, 1994.
[13] B. Ma, J. Tromp, and M. Li, “Patternhunter: Faster and More Sensitive Homology Search,” Bioinformatics, vol. 18, pp. 440-445, 2002.
[14] T.M. Mitchell, “The Need for Biases in Learning Generalisations,” Readings in Machine Learning, J.W. Shavlik and T.G. Dietterich, eds., Morgan Kaufmann, 1980.
[15] J.B. Pollack, “The Induction of Dynamical Recognizers,” Machine Learning, vol. 7, p. 227, 1991.
[16] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi, “Improving the Prediction of Protein Secondary Strucure in Three and Eight Classes Using Recurrent Neural Networks and Profiles,” Proteins, vol. 47, pp. 228-235, 2002.
[17] T.D. Schneider and R.M. Stephens, “Sequence Logos: A New Way to Display Consensus Sequences,” Nucleic Acids Research, vol. 18, no. 20, pp. 6097-6100, 1990.
[18] P. Tino, M. Cernansky, and L. Benuskova, “Markovian Architectural Bias of Recurrent Neural Networks,” IEEE Trans. Neural Networks, vol. 15, no. 1, pp. 6-15, 2004.
[19] P. Tino and B. Hammer, “Architectural Bias in Recurrent Neural Networks: Fractal Analysis,” Neural Computation, vol. 15, no. 8, pp. 1931-1957, 2003.
[20] A. Vullo and P. Frasconi, “A Recursive Connectionist Approach for Predicting Disulfide Connectivity in Proteins,” Proc. 2003 ACM Symp. Applied Computing, pp. 67-71, 2003.
[21] E.J.B. Williams, C. Pal, and L.D. Hurst, “The Molecular Evolution of Signal Peptides,” Gene, vol. 253, no. 2, pp. 313-322, 2000.

Index Terms:
Index Terms- Machine learning, neural network architecture, recurrent neural network, bias, biological sequence analysis, motif, subcellular localization, pattern recognition, classifier design.
John Hawkins, Mikael Bod?, "The Applicability of Recurrent Neural Networks for Biological Sequence Analysis," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 3, pp. 243-253, July-Sept. 2005, doi:10.1109/TCBB.2005.44
Usage of this product signifies your acceptance of the Terms of Use.