The Community for Technology Leaders
RSS Icon
Issue No.04 - July/August (2011 vol.8)
pp: 959-975
Chao-Wen Huang , National Cheng Kung University, Tainan
Wun-Shiun Lee , National Cheng Kung University, Tainan
Sun-Yuan Hsieh , National Cheng Kung University, Tainan
The planted (l,d)-motif search problem is a mathematical abstraction of the DNA functional site discovery task. In this paper, we propose a heuristic algorithm that can find planted (l,d)-signals in a given set of DNA sequences. Evaluations on simulated data sets demonstrate that the proposed algorithm outperforms current widely used motif finding algorithms. We also report the results of experiments on real biological data sets.
Planted motif search problem, heuristic algorithms, (l, d)-signals.
Chao-Wen Huang, Wun-Shiun Lee, Sun-Yuan Hsieh, "An Improved Heuristic Algorithm for Finding Motif Signals in DNA Sequences", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 4, pp. 959-975, July/August 2011, doi:10.1109/TCBB.2010.92
[1] T. Bailey and C. Elka, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology, pp. 28-36, 1994.
[2] M. Blanchette, “Algorithms for Phylogenetic Footprinting,” Proc. Fifth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 49-58, 2001.
[3] J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” J. Computational Biology, vol. 9, pp. 225-242, 2002.
[4] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.F. Sagot, “A Highly Scalable Algorithm for the Extraction of CIS-Regulatory Regions,” Proc. Third Asia-Pacific Bioinformatics Conf., vol. 1, pp. 273-282, 2005.
[5] F.Y.L. Chin and H.C.M. Leung, “Voting Algorithms for Discovering Long Motifs,” Proc. Third Asia-Pacific Bioinformatics Conf., pp. 261-271, 2005.
[6] G.E. Crooks, G. Hon, J.M. Chandonia, and S.E. Brenner, “WebLogo: A Sequence Logo Generator,” Genome Research, vol. 14, pp. 1188-1190, 2004.
[7] J. Davila, S. Balla, and S. Rajasekaran, “Fast and Practical Algorithms for Planted $(l, d)$ Motif Search,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, Oct.-Dec. 2007.
[8] C. Debouck and P.N. Goodfellow, “DNA Microarrays in Drug Discovery and Development,” Nature Genetics, vol. 21, pp. 48-50, 1999.
[9] E. Eskin and P.A. Pevzner, “Finding Composite Regulatory Patterns in DNA Sequences,” Bioinformatics, vol. 18, pp. 354-363, 2002.
[10] P.A. Evans, A. Smith, and H.T. Wareham, “On the Complexity of Finding Common Approximate Substrings,” Theoretical Computer Science, vol. 306, pp. 407-430, 2003.
[11] P.A. Evans and A. Smith, “Toward Optimal Motif Enumeration,” Proc. Eighth Int'l Workshop Algorithms and Data Structures, pp. 47-58, 2003.
[12] M. Hase, T. Yokomizo, T. Shimizu, and M. Nakamura, “Characterization of an Orphan G Protein-Coupled Receptor, GPR20, that Constitutively Activates ${\rm G}_{i}$ Proteins,” The J. Biological Chemistry, vol. 283, pp. 12747-12755, 2008.
[13] G.Z. Hertz and G.D. Stormo, “Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences,” Bioinformatics, vol. 15, pp. 563-577, 1999.
[14] H.J. Huttunen and H. Rauvala, “Amphoterin as an Extracellular Regulator of Cell Motility: From Discovery to Disease,” J. Internal Medicine, vol. 255, pp. 351-366, 2001.
[15] U. Keich and P.A. Pevzner, “Finding Motifs in the Twilight Zone,” Bioinformatics, vol. 18, pp. 374-381, 2002.
[16] K.F. Koehler, L.A. Helguero, L.A. Haldosén, M. Warner, and J.Å. Gustafsson, “Reflections on the Discovery and Significance of Estrogen Recepter $\beta$ ,” The Endocrine Soc., vol. 26, pp. 465-478, 2005.
[17] M. Li, B. Ma, and L. Wang, “On the Closest String and Substring Problems,” J. ACM, vol. 49, no. 2, pp. 157-171, 2002.
[18] C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton, “Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment,” Science, vol. 262, pp. 208-214, 1993.
[19] X. Liu, J.S. Liu, and D.L. Brutlag, “BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes,” Proc. Pacific Symp. Biocomputing, pp. 127-138, 2001.
[20] L. Marsan and M.F. Sagot, “Extracting Structured Motifs Using a Suffix Tree-Algorithms and Application to Promoter Consensus Identification,” Proc. Fourth Ann. Int'l Conf. Computational Molecular Biology, 2000.
[21] T. Nagahata, T. Sato, A. Tomura, M. Onda, K. Nishikawa, and M. Emi, “Identification of RAI3 as a Therapeutic Target for Breast Cancer,” Endocrine-Related Cancer, vol. 12, pp. 65-73, 2005.
[22] G. Pavesi, F. Zambelli, and G. Pesole, “WeederH: An Algorithm for Finding Conserved Regulatory Motifs and Regions in Homologous Sequences,” BMC Bioinformatics, vol. 8, pp. 46-59, 2007.
[23] P.A. Pevzner and S.H. Sze, “Combinatorial Approaches to Finding Subtle Signals in DNA Sequences,” Intelligent Systems for Molecular Biology, pp. 269-278, 2000.
[24] N. Pisanti, A.M. Carvalho, L. Marsan, and M.F. Sagot, “RISOTTO: Fast Extraction of Motifs with Mismatches,” Proc. Seventh Latin Am. Theoretical Symp., pp. 757-768, 2006.
[25] A. Price, S. Ramabhadran, and P.A. Pevzner, “Finding Subtle Motifs by Branching from Sample Strings,” Bioinformatics, vol. 19, pp. 149-155, 2003.
[26] S. Rajasekaran, S. Balla, and C.H. Huang, “Exact Algorithms for Planted Motif Problems,” J. Computational Biology, vol. 8, pp. 1117-1128, 2005.
[27] J.C. Reed, “Apoptosis-Regulating Proteins as Targets for Drug Discovery,” Trends in Molecular Medicine, vol. 7, pp. 314-319, 2001.
[28] I. Rigoutsos and A. Floratos, “Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm,” Bioinformatics, vol. 14, pp. 56-57, 1998.
[29] F.P. Roth, J.D. Hughes, P.W. Estep, and G.M. Church, “Finding DNA Regulatory Motifs within Unaligned Noncoding Sequences Clustered by Whole-Genome mRNA Quantitation,” Nature Biotechnology, vol. 16, pp. 939-945, 1998.
[30] R. Siddharthan, E.D. Siggia, and E. van Nimwegen, “Phylogibbs: A Gibbs Sampler Incorporating Phylogenetic Information,” PLoS Computational Biology, vol. 1, pp. 534-556, 2005.
[31] S. Sivashankari and P. Shanmughavel, “Comparative Genomics— A Perspective,” Bioinformation, vol. 1, pp. 376-378, 2007.
[32] G.D. Stormo, “DNA Binding Sites: Representation and Discovery,” Bioinformatics, vol. 16, pp. 16-23, 2000.
[33] G.D. Stormo and G.W. Hartzell, “Identifying Protein-Binding Sites from Unaligned DNA Fragments,” Proc. Nat'l Academy of Sciences USA, vol. 86, pp. 1183-1187, 1989.
[34] Y. Suh and J. Vijg, “SNP Discovery in Associating Genetic Variation with Human Disease Phenotypes,” Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, vol. 573, pp. 41-53, 2005.
[35] S.M. Tareeq, S. Saha, T. Islam, and R. Quazi, “ANT: A Novel Heuristic Algorithm for Finding Motif,” Information Technology J., vol. 6, pp. 189-195, 2007.
[36] J.W. Thomas, J.W. Touchman, R.W. Blakesley, G.G. Bouffard, S.M. Beckstrom-Sternberg, and E.H. Margulies, “Comparative Analyses of Multi-Species Sequences from Targeted Genomic Regions,” Nature, vol. 424, pp. 788-793, 2003.
[37] E. Wingender, P. Dietze, H. Karas, and R. Knüppel, “TRANSFAC: A Database on Transcription Factors and Their DNA Binding Sites,” Nucleic Acids Research, vol. 24, pp. 238-241, 1996.
[38] C.T. Workman and G.D. Stormo, “ANN-SPEC: A Method for Discovering Transcription Factor Binding Sites with Improved Specificity,” Proc. Pacific Symp. Biocomputing, vol. 5, pp. 464-475, 2000.
[39] F.Y. Xie, M.C. Woodle, and P.Y. Lu, “Harnessing in vivo siRNA Delivery for Drug Discovery and Therapeutic Development,” Drug Discovery Today, vol. 11, pp. 67-73, 2006.
[40] M. Tompa, N. Li, T.L. Bailey, G.M. Church, B.D. Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, V.J. Makeev, A.A. Mironov, W.S. Noble, G. Pavesi, G. Pesole, M. Regnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu, “Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites,” Nature Biotechnology, vol. 23, pp. 137-144, 2005.
[41] J. Zhu and M.Q. Zhang, “SCPD: A Promoter Database of the Yeast Saccharomyces Cerevisiae,” Bioinformatics, vol. 15, pp. 607-611, 1999.
3 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool