The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January-March (2008 vol.5)
pp: 1-14
ABSTRACT
In previous work, we presented GAMI [1], an approach to motif inference that uses a genetic algorithms search. GAMI is designed specifically to find putative conserved regulatory motifs in noncoding regions of divergent species, and is designed to allow for analysis of long nucleotide sequences. In this work, we compare GAMI's performance when run with its original fitness function (a simple count of the number of matches) and when run with information content, as well as several variations on these metrics. Results indicate that information content does not identify highly conserved regions, and thus is not the appropriate metric for this task, while variations on information content as well as the original metric succeed in identifying putative conserved regions.
INDEX TERMS
Evolutionary computing and genetic algorithms, Biology and genetics
CITATION
Clare Bates Congdon, Joseph C. Aman, Gerardo M. Nava, H. Rex Gaskins, Carolyn J. Mattingly, "An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 1, pp. 1-14, January-March 2008, doi:10.1109/TCBB.2007.1059
REFERENCES
[1] C.B. Congdon, C.W. Fizer, N.W. Smith, H.R. Gaskins, J. Aman, G.M. Nava, and C. Mattingly, “Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference,” Proc. 2005 IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), pp. 97-104, 2005.
[2] V. Matys, E. Fricke, R. Geffers, E. Gossling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Morgoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Munch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, “Transfac: Transcriptional Regulation, from Patterns to Profiles,” Nucleic Acids Research, vol. 31, pp. 374-378, 2003.
[3] L.A. Pennacchio and E.M. Rubin, “Comparative Genomic Tools and Databases: Providing Insights into the Human Genome,” J.Clinical Investigation, vol. 111, pp. 1099-1106, 2003.
[4] J.W. Thomas and J.W. Touchman, “Vertebrate Genome Sequencing: Building a Backbone for Comparative Genomics,” Trends in Genetics, vol. 18, pp. 104-108, 2002.
[5] J.W. Thomas, J.W. Touchman, R.W. Blakesley, G.G. Bouffard, S.M. Beckstrom-Sternberg, and E.H. Margulies, “Comparative Analyses of Multi-Species Sequences from Targeted Genomic Regions,” Nature, vol. 424, pp. 788-793, 2003.
[6] I. Dubchak, M. Brudno, G.G. Loots, L. Pachter, C. Mayor, E.M. Rubin, and K.A. Frazer, “Active Conservation of Non-Coding Sequences Revealed by Three-Way Species Comparisons,” Genome Research, vol. 10, pp. 1304-1306, 2000.
[7] S. Santini, J.L. Boore, and A. Meyer, “Evolutionary Conservation of Regulatory Elements in Vertebrate Hox Gene Clusters,” Genome Research, vol. 13, pp. 1111-1122, 2003.
[8] S. Aparicio, A. Morrison, A. Gould, J. Gilthorpe, C. Chaudhuri, P. Rigby, R. Krumlauf, and S. Brenner, “Detecting Conserved Regulatory Elements with the Model Genome of the Japanese Puffer Fish, Fugu Rubripes,” Proc. Nat'l Academy of Sciences, vol. 92, pp. 1684-1688, 1995.
[9] J.D. Hughes, P.W. Estep, S. Tavazoie, and G.M. Church, “Computational Identification of CIS-Regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae,” J. Molecular Biology, vol. 296, pp. 1205-1214, 2000.
[10] G.G. Loots, R.M. Locksley, C.M. Blankespoor, Z.E. Wang, W. Miller, E.M. Rubin, and K.A. Frazer, “Identification of a Coordinate Regulator of Interleukins 4, 13 and 5 by Cross-Species Sequence Comparisons,” Science, vol. 288, pp. 136-140, 2000.
[11] M. Tompa, N. Li, T. Bailey, G. Church, B. De Moor, E. Eskin, A. Favorov, M. Frith, Y. Fu, J. Kent, V. Makeev, A. Mironov, W. Noble, G. Pavesi, G. Pesole, and M. Ry, “An Assessment of Computational Tools for the Discovery of Transcription Factor Binding Sites,” Nature Biotechnology, vol. 23, no. 1, pp. 137-144, Jan. 2005.
[12] K. Cartharius, K. Frech, K. Grote, B. Klocke, M. Haltmeier, A. Klingenhoff, M. Frisch, M. Bayerlein, and T. Werner, “Matinspector and Beyond: Promoter Analysis Based on Transcription Factor Binding Sites,” Bioinformatics, vol. 21, no. 13, pp. 2933-2942, 2005.
[13] M.A. Lones and A.M. Tyrrell, “The Evolutionary Computation Approach to Motif Discovery in Biological Sequences,” Proc. ACM SIGEVO Genetic and Evolutionary Computation Conf. (GECCO '05); Workshop Biological Applications of Genetic and Evolutionary Computation, 2005.
[14] D. Corne, A. Meade, and R. Sibly, “Evolving Core Promoter Signal Motifs,” Proc. 2001 Congress on Evolutionary Computation (CEC '01), pp. 1162-1169, 2001.
[15] A. Meade, D. Corne, and R. Sibly, “Discovering Patterns in Microsatellite Flanks with Evolutionary Computation by Evolving Discriminatory DNA Motifs,” Proc. 2002 Congress Evolutionary Computation (CEC '02), pp. 1-6, 2002.
[16] J.D. Thompson, D.G. Higgins, and T.J. Gibson, “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, no. 22, 1994.
[17] S. Schwartz, Z. Zhang, K.A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller, “PipMaker—A Web Server for Aligning Two Genomic DNA Sequences,” Genome Research, vol. 10, no. 4, pp. 577-586, Apr. 2000.
[18] T.L. Bailey and C. Elkan, “Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers,” Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology, pp. 28-36, 1994.
[19] W. Thompson, E.C. Rouchka, and C.E. Lawrence, “Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites,” Nucleic Acids Research, vol. 31, no. 13, pp. 3580-3585, 2003.
[20] G.B. Fogel, D.G. Weekes, G. Varga, H.B. Harlow, J.E. Onyia, and C. Su, “Discovery of Sequence Motifs Related to Co-Expression of Genes Using Evolutionary Computation,” Nucleic Acids Research, vol. 32, no. 13, pp. 3826-3835, 2004.
[21] C.F. Higgins, “ABC Transporters: From Microorganisms to Man,” Ann. Rev. of Cell Biology, vol. 8, pp. 67-113, 1992.
[22] M. Dean, A. Rzhetsky, and R. Allikmets, “The Human ATP-Binding Cassette (ABC) Transporter Superfamily,” Genome Research, vol. 11, pp. 1156-1166, 2001.
[23] E.M. Leslie, R.G. Deeley, and S.P. Cole, “Toxicological Relevance of the Multidrug Resistance Protein 1, Mrp1 (ABCC1) and Related Transporters,” Toxicology, vol. 167, pp. 3-23, 2001.
[24] J.D. Hayes, J.U. Flanagan, and I.R. Jowsey, “Glutathione Transferases,” Ann. Rev. of Pharmacology and Toxicology, vol. 45, pp. 51-88, 2005.
[25] C.C. McIlwain, D.M. Townsend, and K.D. Tew, “Glutathione S-Transferase Polymorphisms: Cancer Incidence and Therapy,” Oncogene, vol. 25, no. 11, pp. 1639-1648, 2006.
[26] A. Woolfe et al., “Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development,” PLoS Biology, vol. 3, no. e7, pp. 116-130, 2005.
[27] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
[28] L. Davis, Handbook of Genetic Algorithms. Van Nostrand Reinhold, 1991.
[29] M. Mitchell, An Introduction to Genetic Algorithms. MIT Press, 1996.
[30] V. Curwen, E. Eyras, T.D. Andrews, L. Clarke, E. Mongin, S.M. Searle, and M. Clamp, “The Ensembl Automatic Gene Annotation System,” Genome Research, vol. 14, pp. 942-950, 2004.
[31] K.D. Pruitt, T. Tatusov, and D.R. Maglott, “NCBI Reference Sequence (REFSEQ): A Curated Non-Redundant Sequence Database of Genomes, Transcripts and Proteins,” Nucleic Acids Research, vol. 33, pp. D501-D504, 2005.
[32] A. Marchler-Bauer and S.H. Bryant, “CD-Search: Protein Domain Annotations on the Fly,” Nucleic Acids Research, vol. 32, pp. W327-W331, 2004.
[33] T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[34] M. Clamp, J. Cuff, S.M. Searle, and G.J. Barton, “The Jalview Java Alignment Editor,” Bioinformatics, vol. 20, pp. 426-427, 2004.
[35] W.R. Pearson, “Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith-Waterman and FASTA Algorithms,” Genomics, vol. 11, pp. 635-650, 1991.
[36] M. Brudno et al., “Lagan and Multi-Lagan: Efficient Tools for Large Scale Multiple Alignment of Genomic DNA,” Genome Research, vol. 13, pp. 721-731, 2003.
[37] A.F.A. Smit and P. Green, “Repeatmasker Open-3.0,” http:/www.repeatmasker.org, 1996–2004.
[38] J.J. Grefenstette, “A User's Guide to GENESIS,” technical report, Navy Center for Applied Research in AI, 1987, source code updated 1990, http://www.cs.cmu.edu/afs/cs/project/ ai-repository/ ai/areas/genetic/ga/systems genesis/.
[39] G.Z. Hertz and G.D. Stormo, “Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences,” Bioinformatics, vol. 15, no. 7, pp. 563-577, 1999.
[40] G.E. Crooks, G. Hon, J.M. Chandonia, and S.E. Brenner, “WebLogo: A Sequence Logo Generator,” Genome Research, vol. 14, pp. 1188-1190, 2004.
[41] N. Mouchel, S.A. Henstra, V.A. McCarthy, S.H. Williams, M. Phylactides, and A. Harris, “Hnf1alpha Is Involved in Tissue-Specific Regulation of CFTR Gene Expression,” Biochemical J., vol. 378, no. Pt 3, pp. 909-918, 15 Mar. 2004.
[42] M. Levinson-Dushnik and N. Benvenisty, “Involvement of Hepatocyte Nuclear Factor 3 in Endoderm Differentiation of Embryonic Stem Cells,” Molecular and Cellular Biology, vol. 17, no. 7, pp. 3817-3822, July 1997.
[43] X. Hu, J.R. Roberts, P.L. Apopa, Y.W. Kan, and Q. Ma, “Accelerated Ovarian Failure Induced By 4-Vinyl Cyclohexene Diepoxide in Nrf2 Null Mice,” Molecular and Cellular Biology, vol. 26, no. 3, pp. 940-954, 2006.
[44] V. Bombail, K. Taylor, G.G. Gibson, and N. Plant, “Role of Sp1, C/EBP Alpha, HNF3 and PXR in the Basal- and Xenobiotic-Mediated Regulation of the CYP3A4 Gene,” Drug Metabolism and Disposition, vol. 32, no. 5, pp. 525-535, 2004.
14 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool