This Article 
 Bibliographic References 
 Add to: 
Grammatical Inference in Bioinformatics
July 2005 (vol. 27 no. 7)
pp. 1051-1062
Bioinformatics is an active research area aimed at developing intelligent systems for analyses of molecular biology. Many methods based on formal language theory, statistical theory, and learning theory have been developed for modeling and analyzing biological sequences such as DNA, RNA, and proteins. Especially, grammatical inference methods are expected to find some grammatical structures hidden in biological sequences. In this article, we give an overview of a series of our grammatical approaches to biological sequence analyses and related researches and focus on learning stochastic grammars from biological sequences and predicting their functions based on learned stochastic grammars.

[1] N. Abe and H. Mamitsuka, “A New Method for Predicting Protein Secondary Structures Based on Stochastic Tree Grammars,” Proc. 11th Int'l Conf. Machine Learning, pp. 3-11, 1999.
[2] N. Abe and M.K. Warmuth, “On the Computational Complexity of Approximating Distributions by Probabilistic Automata,” Machine Learning, vol. 9, pp. 205-260, 1992.
[3] A.V. Aho and J.D. Ullman, The Theory of Parsing, Translation and Compiling, vol. I: Parsing, Prentice Hall, 1972.
[4] D. Angluin, “On the Complexity of Minimum Inference of Regular Sets,” Information and Control, vol. 39, pp. 337-350, 1978.
[5] D. Angluin, “Inference of Reversible Languages,” J. ACM, vol. 29, pp. 741-765, 1982.
[6] J.K. Baker, “Trainable Grammars for Speech Recognition,” Speech Comm. Papers for the 97th Meeting of the Acoustical Soc. of Am., pp. 547-550, 1979.
[7] L. Cai, R.L. Malmberg, and Y. Wu, “Stochastic Modeling of RNA Pseudoknotted Structures: A Grammatical Approach,” Bioinformatics, vol. 19, pp. i66-i73, 2003.
[8] R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis. Cambridge Univ. Press, 1998.
[9] P. Dupont, L. Miclet, E. Vidal, “What is the Search Space of the Regular Inference?” Proc. Second Int'l Colloquium on Grammatical Inference (ICGI '94), pp. 26-37, 1994.
[10] S. Eddy and R. Durbin, “RNA Sequence Analysis Using Covariance Models,” Nucleic Acids Research, vol. 22, pp. 2079-2088, 1994.
[11] S. Eddy, “Profile Hidden Markov Models,” Bioinformatics, vol. 14, pp. 755-763, 1998.
[12] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979.
[13] I. Holmes and G. Rubin, “Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars,” Proc. Fifth Pacific Symp. Biocomputing, pp. 163-174, 2002.
[14] T. Jiang, L. Wang, and K. Zhang, “Alignment of Trees— An Alternative to Tree Edit,” Theoretical Computer Science, vol. 143, pp. 137-148, 1995.
[15] A. Krogh, M. Brown, I.S. Mian, K. Sjölander, and D. Haussler, “Hidden Markov Models in Computational Biology: Applications to Protein Modeling,” J. Molecular Biology, vol. 235, pp. 1501-1531, 1994.
[16] K. Lari and S.J. Young, “The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm,” Computer Speech and Language, vol. 4, pp. 35-56, 1990.
[17] H. Matsui, K. Sato, and Y. Sakakibara, “Pair Stochastic Tree Adjoining Grammars for Aligning and Predicting Pseudoknot RNA Structures,” Proc. IEEE CS Bioinformatics Conf. (CSB '04), 2004.
[18] S. Muggleton, C. Bryant, A. Srinivasan, A. Whittaker, S. Topp, and C. Rawlings, “Are Grammatical Representations Useful for Learning from Biological Sequence Data?” J. Computational Biology, vol. 8, pp. 493-522, 2001.
[19] L. Pachter, M. Alexandersson, and S. Cawley, “Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems,” J. Computational Biology, vol. 9, pp. 389-399, 2002.
[20] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[21] E. Rivas and S. Eddy, “The Language of RNA: A Formal Grammar That Includes Pseudoknots,” Bioinformatics, vol. 16, pp. 334-340, 2000.
[22] E. Rivas and S. Eddy, “Noncoding RNA Gene Detection Using Comparative Sequence Analysis,” BMC Bioinformatics, vol. 2, no. 8, 2001.
[23] Y. Sakakibara, “Efficient Learning of Context-Free Grammars From Positive Structural Examples,” Information and Computation, vol. 97, pp. 23-60, 1992.
[24] Y. Sakakibara, M. Brown, R. Hughey, I.S. Mian, K. Sjolander, R.C. Underwood, and D. Haussler, “Stochastic Context-Free Grammars for tRNA Modeling,” Nucleic Acids Research, vol. 22, pp. 5112-5120, 1994.
[25] Y. Sakakibara, “Recent Advances of Grammatical Inference,” Theoretical Computer Science, vol. 185, pp. 15-45, 1997.
[26] Y. Sakakibara and M. Kondo, “GA-Based Learning of Context-Free Grammars Using Tabular Representations,” Proc. 16th Int'l Conf. Machine Learning (ICML '99), pp. 354-360, 1999.
[27] Y. Sakakibara, “Pair Hidden Markov Models on Tree Structures,” Bioinformatics, vol. 19, pp. 232-240, 2003.
[28] Y. Sakakibara, “Learning Context-Free Grammars Using Tabular Representations,” Pattern Recognition, to appear.
[29] F. Pereira and Y. Schabes, “Inside-Outside Reestimation for Partially Bracketed Corpora,” Proc. 30th Ann. Meeting of the Association for Computational Linguistics, pp. 128-135, 1992.
[30] D. Searls and K. Murphy, “Automata-Theoretic Models of Mutation and Alignment,” Proc. Third Int'l Conf. Intelligent Systems for Molecular Biology, pp. 341-349, 1995.
[31] D. Searls, “The Language of Genes,” Nature, vol. 420, pp. 211-217, Nov. 2002.
[32] S. Steinberg, A. Misch, and M. Sprinzl, “Compilation of tRNA Sequences and Sequences of tRNA Genes,” Nucleic Acids Research, vol. 21, pp. 3011-3015, 1993.
[33] A. Stolcke and S. Omohundro, “Inducing Probabilistic Grammars by Bayesian Model Merging,” Proc. Second Int'l Colloquium on Grammatical Inference (ICGI '94), pp. 106-118, 1994.
[34] J. Thompson, D. Higgins, and T. Gibson, “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, pp. 4673-4680, 1999.
[35] Y. Uemura, A. Hasegawa, S. Kobayashi, and T. Yokomori, “Tree-Adjoining Grammars for RNA Structure Prediction,” Theoretical Computer Science, vol. 10, pp. 277-303, 1999.

Index Terms:
Index Terms- Grammatical inference, bioinformatics, molecular biology, hidden Markov model, stochastic context-free grammar.
Yasubumi Sakakibara, "Grammatical Inference in Bioinformatics," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1051-1062, July 2005, doi:10.1109/TPAMI.2005.140
Usage of this product signifies your acceptance of the Terms of Use.