The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2010 vol.7)
pp: 619-627
Sanguthevar Rajasekaran , University of Connecticut, Storrs
Sahar Al Seesi , University of Connecticut, Storrs
Reda A. Ammar , University of Connecticut, Storrs
ABSTRACT
Formal grammars have been employed in biology to solve various important problems. In particular, grammars have been used to model and predict RNA structures. Two such grammars are Simple Linear Tree Adjoining Grammars (SLTAGs) and Extended SLTAGs (ESLTAGs). Performances of techniques that employ grammatical formalisms critically depend on the efficiency of the underlying parsing algorithms. In this paper, we present efficient algorithms for parsing SLTAGs and ESLTAGs. Our algorithm for SLTAGs parsing takes O({\rm min} \{m,n^4\} ) time and O({\rm min} \{m,n^4\} ) space, where m is the number of entries that will ever be made in the matrix M (that is normally used by TAG parsing algorithms). Our algorithm for ESLTAGs parsing takes O(n{\rm min} \{m,n^4\} ) time and O({\rm min} \{m,n^4\} ) space. We show that these algorithms perform better, in practice, than the algorithms of Uemura et al. [21].
INDEX TERMS
RNA structure analysis, tree adjoining grammars, parsing algorithms.
CITATION
Sanguthevar Rajasekaran, Sahar Al Seesi, Reda A. Ammar, "Improved Algorithms for Parsing ESLTAGs: A Grammatical Model Suitable for RNA Pseudoknots", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 4, pp. 619-627, October-December 2010, doi:10.1109/TCBB.2010.54
REFERENCES
[1] S. Al Seesi, S. Rajasekaran, and R. Ammar, "Pseudoknot Identification through Learning $TAG_{RNA}s$ ," Proc. Int'l Conf. Pattern Recognition in Bioinformatics (PRIB '08), M. Chetty, A. Ngom, and S. Ahmad, eds., pp. 132-143, 2008.
[2] S. Al Seesi, S. Rajasekaran, and R. Ammar, "RNA Pseudoknot Folding through Inference Identification Using $TAG_{RNA}s$ ," Proc. Int'l Conf. Bioinformatics and Computational Biology (BiCoB '09), pp. 90-101, 2009.
[3] F.H.D. Batenburg, A.P. Gultyaev, C.W.A. Pleij, J. Ng, and J. Oliehoek, "Pseudobase: A Database with RNA Pseudoknots," Nucleic Acids Research, vol. 28, no. 1, pp. 201-204, 2000.
[4] Y. Byun and K. Han, "PseudoViewer: Web Application and Web Service for Visualizing RNA Pseudoknots and Secondary Structures," Nucleic Acids Research, vol. 34, pp. W416-W422, 2006.
[5] D. Coppersmith and S. Winograd, "Matrix Multiplication via Arithmetic Progressions," J. Symbolic Computation, vol. 9, pp. 251-280, 1990.
[6] S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S.R. Eddy, and A. Bateman, "Rfam: Annotating Non-Coding RNAs in Complete Genomes," Nucleic Acids Research, vol. 33, pp. D121-D124, 2005.
[7] Y. Guan and G. Hotz, "An $O(n^5)$ Recognition Algorithm for Coupled Parenthesis Rewriting Systems," Proc. TAG+ Workshop, 1992.
[8] K. Harbusch, "An Efficient Parsing Algorithm for Tree Adjoining Grammars," Proc. 28th Meeting of the Assoc. for Computational, pp. 284-291, 1990.
[9] A.K. Joshi, L.S. Levy, and M. Takahashi, "Tree Adjunct Grammars," J. Computer and System Sciences, vol. 10, no. 1, pp. 136-163, 1975.
[10] H. Matsui, K. Sato, and Y. Sakakibara, "Pair Stochastic Tree Adjoining Grammars for Aligning and Predicting Pseudoknot RNA Structures," Bioinformatics, vol. 21, no. 11, pp. 2611-2617, 2005.
[11] T. Nurkkala and V. Kumar, "A Parallel Parsing Algorithm for Natural Language Using Tree Adjoining Grammar," Proc. Eighth Int'l Parallel Processing Symp., 1994.
[12] J.C. Paillart, E. Skripkin, B. Ehresmann, C. Ehresmann, and R. Marquet, "In Vitro Evidence for a Long Range Pseudoknot in the 5-Untranslated and Matrix Coding Regions of HIV-1 Genomic RNA," J. Biological Chemistry, vol. 277, pp. 5995-6004, 2002.
[13] M. Palis, S. Shende, and D.S.L. Wei, "An Optimal Linear Time Parallel Parser for Tree Adjoining Languages," SIAM J. Computing, vol. 19, no. 1, pp. 1-31, 1990.
[14] B.H. Partee, A. Ter Meulen, and R.E. Wall, Studies in Linguistics and Philosophy, vol. 30., Kluwer Academic Publishers, 1990.
[15] S. Rajasekaran, "TAL Parsing in $o(n^6)$ Time," SIAM J. Computing, vol. 25, no. 4, pp. 862-873, 1996.
[16] S. Rajasekaran and S. Yooseph, "TAL Parsing in $O(M(n^2))$ Time," J. Computer and System Sciences, vol. 56, no. 1, pp. 83-89, 1998.
[17] Y. Sakakibara, M. Brown, R. Hughey, I. Mian, K. Sjolander, R.C. Underwood, and D. Haussler, "Stochastic Context-Free Grammars for tRNA Modeling," Nucleic Acids Research, vol. 22, pp. 5112-5120, 1994.
[18] G. Satta, "Tree Adjoining Grammar Parsing and Boolean Matrix Multiplication," Proc. 32nd Meeting of the Assoc. for Computational Linguistics, 1994.
[19] Y. Schabes and A.K. Joshi, "An Early-Type Parsing Algorithm for Tree Adjoining Grammars," Proc. 26th Meeting of the Assoc. for Computational Linguistics, pp. 258-269, 1988.
[20] M. Taufer, A. Licon, R. Araiza, D. Mireles, F.H.D. Batenburg, A.P. Gultyaev, and M.-Y. Leung, "PseudoBase++: An Extension of PseudoBase for Easy Searching, Formatting and Visualization of Pseudoknots," Nucleic Acids Research, Database Issue, vol. 37, pp. D127-D135, 2009.
[21] Y. Uemura, A. Hasegawa, S. Kobayashi, and T. Yokomori, "Tree Adjoining Grammars for RNA Structure Prediction," Theoretical Computer Science, vol. 210, pp. 277-303, 1999.
[22] K. Vijayashanker and A.K. Joshi, "Some Computational Properties of Tree Adjoining Grammars," Proc. 23rd Meeting of the Assoc. for Computational Linguistics, pp. 82-93, 1985.
[23] K.P. Williams, "The tmRNA website: Invasion by an Intron," Nucleic Acids Research, vol. 30, no. 1, pp. 179-182, 2002.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool