This Article 
 Bibliographic References 
 Add to: 
Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm
July-September 2007 (vol. 4 no. 3)
pp. 403-414
This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences.

[1] T.L. Bailey and C. Elkan, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology, vol. 2, pp. 28-36, 1994.
[2] M. Brameier, J. Haan, A. Krings, and R. MacCallum, “Automatic Discovery of Cross-Family Sequence Features Associated with Protein Function,” BMC Bioinformatics, vol. 7, p. 16, 2006.
[3] E. Cantú-Paz, “Designing Efficient and Accurate Parallel Genetic Algorithms,” PhD dissertation, Univ. of Illinois at Urbana-Champaign, 1999.
[4] C.B. Congdon, C. Fizer, N.W. Smith, H.R. Gaskins, J. Aman, G.M. Nava, and C. Mattingly, “Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference,” Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), pp. 97-104, 2005.
[5] D. Corne, A. Meade, and R. Sibly, “Evolving Core Promoter Signal Motifs,” Proc. Congress Evolutionary Computation (CEC '01), pp.1162-1169, May 2001.
[6] T.A. Down and T.J.P. Hubbard, “NestedMICA: Sensitive Inference of Over-Represented Motifs in Nucleic Acid Sequence,” Nucleic Acids Research, vol. 33, no. 5, pp. 1445-1453, 2005.
[7] G. Fogel, D. Weekes, G. Varga, E. Dow, H. Harlow, J. Onyia, and C. Su, “Discovery of Sequence Motifs Related to Coexpression of Genes Using Evolutionary Computation,” Nucleic Acids Research, vol. 32, no. 13, pp. 3826-3835, 2004.
[8] Evolutionary Computation in Bioinformatics, G.B. Fogel and D.W.Corne, eds. Morgan Kaufmann, 2002.
[9] R. Fry, S. Smith, and A. Tyrrell, “A Self-Adaptive Mate Selection Model for Genetic Programming,” Proc. IEEE Congress Evolutionary Computation (CEC '05), vol. 3, pp. 2707-2714, 2005.
[10] K. Grote, R. Schneider, and T. Werner, “Kohonen Maps Are Suitable for a Biologically Meaningful Classification of Transcription Factor Binding Site Matrices,” Proc. German Conf. Bioinformatics (GCB '99), 1999.
[11] J. Hartigan, Clustering Algorithms. John Wiley & Sons, 1975.
[12] A. Heddad, M. Brameier, and M. MacCallum, “Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation,” Applications of Evolutionary Computing, Proc. EvoWorkshops '04, pp. 31-40, Apr. 2004.
[13] D. Howard and K. Benson, “Evolutionary Computation Method for Pattern Recognition of Cis-Acting Sites,” Biosystems, vol. 72, nos.1-2, pp. 19-27, Nov. 2003.
[14] J. Hu, B. Li, and D. Kihara, “Limitations and Potentials of Current Motif Discovery Algorithms,” Nucleic Acids Research, vol. 33, no. 15, pp. 4899-4913, 2005.
[15] Y.-J. Hu, “Biopattern Discovery by Genetic Programming,” Proc. Genetic Programming Conf., J.R. Koza et al., ed., pp. 152-157, 1998.
[16] T. Kamada and S. Kawai, “Automatic Display of Network Structures for Human Understanding,” Technical Report 88-007, Dept. of Information Science, Univ. of Tokyo, 1988.
[17] K.J. Kechris, E. van Zwet, P.J. Bickel, and M.B. Eisen, “Detecting DNA Regulatory Motifs by Incorporating Positional Trends in Information Content,” Genome Biology, vol. 5, no. 7, p. R50, 2004.
[18] M.A. Lones and A.M. Tyrrell, “The Evolutionary Computation Approach to Motif Discovery in Biological Sequences,” Proc. Genetic and Evolutionary Computation Conf. (GECCO) Workshop Program, Workshop Biological Applications of Genetic and Evolutionary Computation, F. Rothlauf, ed., pp. 1-11, June 2005.
[19] V. Matys, E. Fricke, R. Geffers, E. Gssling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D.-U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Münch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, “TRANSFAC: Transcriptional Regulation, from Patterns to Profiles,” Nucleic Acids Research, vol. 31, no. 1, pp. 374-378, Jan. 2003.
[20] R. Prier, V. Praz, T. Junier, C. Bonnard, and P. Bucher, “The Eukaryotic Promoter Database (EPD),” Nucleic Acids Research, vol. 28, pp. 302-303, 2000.
[21] P. Qiu, “Recent Advances in Computational Promoter Analysis in Understanding the Transcriptional Regulatory Network,” Biochemical and Biophysical Research Comm., vol. 309, no. 3, pp. 495-501, Sept. 2003.
[22] I. Rigoutsos and A. Floratos, “Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm,” Bioinformatics, vol. 14, no. 1, pp. 55-67, 1998.
[23] B.J. Ross, “The Evolution of Stochastic Regular Motifs for Protein Sequences,” New Generation Computing, vol. 20, no. 2, pp. 187-213, Feb. 2002.
[24] A. Sandelin, W. Alkema, P. Engström, W.W. Wasserman, and B. Lenhard, “JASPAR: An Open-Access Database for Eukaryotic Transcription Factor Binding Profiles,” Nucleic Acids Research, vol. 32, pp. D91-D94, Jan. 2004.
[25] A. Sandelin, A. Höglund, B. Lenhard, and W.W. Wasserman, “Integrated Analysis of Yeast Regulatory Sequences for Biologically Linked Clusters of Genes,” Functional and Integrative Genomics, vol. 3, no. 3, pp. 125-134, July 2003.
[26] G.K. Sandve and F. Drabløs, “A Survey of Motif Discovery Methods in an Integrated Framework,” Biology Direct, vol. 1, no.11, 2006.
[27] B. Sareni and L. Krähenbühl, “Fitness Sharing and Niching Methods Revisited,” IEEE Trans. Evolutionary Computation, vol. 2, pp. 97-106, 1998.
[28] L. Schnitman and T. Yoneyama, “A Clustering Method for Improving the Global Search Capability of Genetic Algorithms,” Proc. Sixth Brazilian Symp. Neural Networks (SBRN '00), F.M.G.França and C.H.C. Ribeiro, eds., pp. 32-37, 2000.
[29] C. Shyu, L. Sheneman, and J.A. Foster, “Multiple Sequence Alignment with Evolutionary Computation,” Genetic Programming and Evolvable Machines, vol. 5, no. 2, pp. 121-144, 2004.
[30] A.D. Smith, P. Sumazin, and M.Q. Zhang, “Identifying Tissue-Selective Transcription Factor Binding Sites in Vertebrate Promoters,” Proc. Nat'l Academy Sciences of the USA, vol. 102, no. 5, pp.1560-1565, Feb. 2005.
[31] G. Stormo, “DNA Binding Sites: Representation and Discovery,” Bioinformatics, vol. 16, no. 1, pp. 16-23, Jan. 2000.
[32] F. Streichert, G. Stein, H. Ulmer, and A. Zell, “A Clustering Based Niching Method for Evolutionary Algorithms,” Proc. Genetic and Evolutionary Computation Conf. (GECCO '03), E. Cantú-Paz, J.A.Foster, K. Deb, D. Davis, R. Roy, U.-M. O'Reilly, H.-G. Beyer, R.Standish, G. Kendall, S. Wilson, M. Harman, J. Wegener, D.Dasgupta, M.A. Potter, A.C. Schultz, K. Dowsland, N. Jonoska, and J. Miller, eds., pp. 644-645, July 2003.
[33] W. Thompson, E.C. Rouchka, and C.E. Lawrence, “Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites,” Nucleic Acids Research, vol. 31, no. 13, pp. 3580-3585, July 2003.
[34] M. Tompa, N. Li, T.L. Bailey, G.M. Church, B.D. Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, V.J. Makeev, A.A. Mironov, W.S. Noble, G. Pavesi, G. Pesole, M. Rgnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu, “Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites,” Nature Biotechnology, vol. 23, no. 1, pp. 137-144, Jan. 2005.
[35] W.W. Wasserman and J.W. Fickett, “Identification of Regulatory Regions which Confer Muscle-Specific Gene Expression,” J.Molecular Biology, vol. 278, no. 1, pp. 167-181, Apr. 1998.
[36] T. Werner, “The State of the Art of Mammalian Promoter Recognition,” Briefings Bioinformatics, vol. 4, no. 1, pp. 22-30, Mar. 2003.
[37] K.-J. Won, A. Prügel-Bennett, and A. Krogh, “Training HMM Structure with Genetic Algorithm for Biological Sequence Analysis,” Bioinformatics, vol. 20, no. 18, pp. 3613-3619, Dec. 2004.
[38] G.A. Wray, M.W. Hahn, E. Abouheif, J.P. Balhoff, M. Pizer, M.V. Rockman, and L.A. Romano, “The Evolution of Transcriptional Regulation in Eukaryotes,” Molecular Biology and Evolution, vol. 20, no. 9, pp. 1377-1419, Sept. 2003.
[39] T. Yada, M. Ishikawa, H. Tanaka, and K. Asai, “Extraction of Hidden Markov Model Representations of Signal Patterns in DNA Sequences,” Proc. Pacific Symp. Biocomputing, pp. 686-696, 1996.

Index Terms:
Evolutionary computation, population-based data clustering, motif discovery, transcription factor binding sites, muscle-specific gene expression
Michael Lones, Andy Tyrrell, "Regulatory Motif Discovery Using a Population Clustering Evolutionary Algorithm," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 403-414, July-Sept. 2007, doi:10.1109/tcbb.2007.1044
Usage of this product signifies your acceptance of the Terms of Use.