CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.01 - January-February

Subscribe

Issue No.01 - January-February (2011 vol.8)

pp: 234-245

Md Tamjidul Hoque , Griffith University, Nathan

Madhu Chetty , Monash University, Churchill

Andrew Lewis , Griffith University, Nathan

Abdul Sattar , Griffith University, Nathan

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.34

ABSTRACT

This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.

INDEX TERMS

Genetic algorithms, twin removal, protein structure prediction, search algorithms, chromosome.

CITATION

Md Tamjidul Hoque, Madhu Chetty, Andrew Lewis, Abdul Sattar, "Twin Removal in Genetic Algorithms for Protein Structure Prediction Using Low-Resolution Model",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 1, pp. 234-245, January-February 2011, doi:10.1109/TCBB.2009.34REFERENCES

- [1] J.T. Pedersen and J. Moult, "Ab Initio Protein Folding Simulations with Genetic Algorithms: Simulations on the Complete Sequence of Small Proteins,"
Proteins, vol. 29, pp. 179-184, 1997.- [2] G.B. Lamont and L.D. Merkie, "Toward Effective Polypeptide Chain Prediction with Parallel Fast Messy Genetic Algorithms,"
Evolutionary Computation in Bioinformatics, G. Fogel and D. Corne, eds., pp. 137-161, Morgan Kaufmann, 2004.- [3] K.A. Dill, S. Bromberg, K. Yue, K.M. Fiebig, D.P. Yee, P.D. Thomas, and H.S. Chan, "Principles of Protein Folding—A Perspective from Simple Exact Models,"
Protein Science, vol. 4, pp. 561-602, 1995.- [4] S.D. Flores and J. Smith, "Study of Fitness Landscapes for the HP Model of Protein Structure Prediction,"
Proc. IEEE Congress Evolutionary Computation (CEC), pp. 2338-2345, 2003.- [5] N. Mousseau and G.T. Barkema, "Exploring High-Dimensional Energy Landscape,"
IEEE Computing in Science and Eng., vol. 1, no. 2, pp. 74-80, 82, Mar./Apr. 1999.- [6] U.H.E. Hansmann, "Protein Folding in Silico: An Overview,"
Computing in Science & Eng., pp. 64-69, Jan./Feb. 2003.- [7] Y. Cui, W.H. Wong, E. Bornberg-Bauer, and H.S. Chan, "Recombinatoric Exploration of Novel Folded Structures: A Heteropolymer-Based Model of Protein Evolutionary Landscapes,"
Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 809-814, 2002.- [8] K. Schreiner, "Distributed Project Tackle Protein Mystery,"
IEEE Computing in Science and Eng., vol. 3, no. 1, pp. 13-16, Jan./Feb. 2001.- [9] K.A. Dill, J.B. Rosen, and A.T. Phillips, "Protein Structure and Energy Landscape Dependence on Sequence Using a Continuous Energy Function,"
J. Computational Biology, vol. 4, pp. 227-239, 1997.- [10] O. Schueler-Furman, C. Wang, P. Bradley, K. Misura, and D. Baker, "Progress in Modeling of Protein Structures and Interactions,"
Science, vol. 310, pp. 638-642, 2005.- [11] K.A. Dill, "Theory for the Folding and Stability of Globular Proteins,"
Biochemistry, vol. 24, pp. 1501-1509, 1985.- [12] R. Backofen and S. Will, "A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models,"
Constraints J., vol. 11, pp. 5-30, 2006.- [13] Y. Xia, E.S. Huang, M. Levitt, and R. Samudrala, "Ab Initio Construction of Protein Tertiary Structures Using a Hierarchical Approach,"
J. Molecular Biology, vol. 300, pp. 171-185, 2000.- [14] C.A. Rohl, C.E.M. Strauss, K.M.S. Misura, and D. Baker, "Protein Structure Prediction Using Rosetta,"
Methods in Enzymology, vol. 383, pp. 66-93, 2004.- [15] Y. Zhang, A.K. Arakaki, and J. Skolnick, "TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6,"
Proteins: Structure, Function, and Bioinformatics, vol. 7, pp. 91-98, 2005.- [16] T. Hoque, M. Chetty, and A. Sattar, "Extended HP Model for Protein Structure Prediction,"
J. Computational Biology, vol. 16, pp. 85-103, 2009.- [17] R. Samudrala, Y. Xia, and M. Levitt, "A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence,"
Proc. Pacific Symp. Biocomputing (PSB), vol. 4, pp. 505-516, 1999.- [18] D. Chivian, T. Robertson, R. Bonneau, and D. Baker, "Ab Initio Methods,"
Structural Bioinformatics, P.E. Bourne and H. Weissig, eds., Wiley-Liss, Inc., 2003.- [19] K.M. Flebig and K.A. Dill, "Protein Core Assembly Processes,"
J. Chemical Physics, vol. 98, pp. 3475-3487, 1993.- [20] K. Yue and K.A. Dill, "Sequence-Structure Relationships in Proteins and Copolymers,"
Physical Rev. E, vol. 48, pp. 2267-2278, 1993.- [21] C.B. Anfinsen, "Studies on the Principles That Govern the Folding of Protein Chains," 1972, http://nobelprize.org/nobel_prizes/chemistry/ laureates/1972anfinsen-lecture.pdf , last accessed Feb. 2009.
- [22] K.A. Dill and H.S. Chan, "From Levinthal to Pathways to Funnels,"
Nature Structural Biology, vol. 4, pp. 10-19, 1997.- [23] H.S. Chan and K.A. Dill, "Protein Folding in the Landscape Perspective: Chevron Plots and Non-Arrhenius Kinetics,"
Proteins: Structure, Function and Genetics, vol. 30, pp. 2-33, 1998.- [24] D. Baker, "A Surprising Simplicity to Protein Folding,"
Nature, vol. 405, pp. 39-42, 2000.- [25] J. Lee, S. Wu, and Y. Zhang, "Ab Initio Protein Structure Prediction,"
From Protein Structure to Function with Bioinformatics, D.J. Rigden, ed., pp. 3-25, Springer Netherlands, 2009.- [26] R. Santana, P. Larrañaga, and J.A. Lozano, "Protein Folding in Simplified Models with Estimation of Distribution Algorithms,"
IEEE Trans. Evolutionary Computation, vol. 12, no. 4, pp. 418-438, Aug. 2008.- [27] V. Cutello, G. Nicosia, M. Pavone, and J. Timmis, "An Immune Algorithm for Protein Structure Prediction on Lattice Models,"
IEEE Trans. Evolutionary Computation, vol. 11, no. 1, pp. 101-107, Feb. 2007.- [28] C. Thachuk, A. Shmygelska, and H.H. Hoos, "A Replica Exchange Monte Carlo Algorithm for Protein Folding in the HP Model,"
BMC Bioinformatics, vol. 8, article no. 342, 2007, doi: 10.11861471-2105-8-342. - [29] Y. Ponty, R. Istrate, E. Porcelli, and P. Clote, "LocalMove: Computing On-Lattice Fits for Biopolymers,"
Nucleic Acids Research, vol. 36, pp. 216-222, 2008.- [30] D. Palù, A. Dovier, and E. Pontelli, "Enhancing the Computation of Approximate Solutions of the Protein Structure Determination Problem through Global Constraints for Discrete Crystal Lattices,"
Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine Workshop (BIBMW), pp. 38-44, 2007.- [31] M. Mann, C. Smith, M. Rabbath, M. Edwards, S. Will, and R. Backofen, "CPSP-Web-Tools: A Server for 3D Lattice Protein Studies,"
Bioinformatics, vol. 25, pp. 1-2, 2009.- [32] U. Bastolla, H. Frauenkron, E. Gerstner, P. Grassberger, and W. Nadler, "Testing a New Monte Carlo Algorithm for Protein Folding,"
PROTEINS: Structure, Function, and Genetics, vol. 32, pp. 52-66, 1998.- [33] F. Liang and W.H. Wong, "Evolutionary Monte Carlo for Protein Folding Simulations,"
J. Chemical Physics, vol. 115, pp. 3374-3380, 2001.- [34] T. Jiang, Q. Cui, G. Shi, and S. Ma, "Protein Folding Simulation of the Hydrophobic-Hydrophilic Model by Computing Tabu Search with Genetic Algorithms,"
Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), 2003.- [35] A. Shmygelska and H.H. Hoos, "An Ant Colony Optimization Algorithm for the 2D and 3D Hydrophobic Polar Protein Folding Problem,"
BMC Bioinformatics, vol. 6, article no. 30, 2005, doi: 10.11861471-2105-6-30. - [36] R. Unger and J. Moult, "On the Applicability of Genetic Algorithms to Protein Folding,"
Proc. 26th Hawaii Int'l Conf. System Sciences, pp. 715-725, 1993.- [37] R. Unger and J. Moult, "Genetic Algorithms for Protein Folding Simulations,"
J. Molecular Biology, vol. 231, pp. 75-81, 1993.- [38] R. Unger and J. Moult, "Genetic Algorithm for 3D Protein Folding Simulations,"
Proc. Fifth Int'l Conf. Genetic Algorithms, pp. 581-588, 1993.- [39] D.B. Fogel,
Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, 2000.- [40] N. Lesh, M. Mitzenmacher, and S. Whitesides, "A Complete and Effective Move Set for Simplified Protein Folding,"
Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 188-195, 2003.- [41] W.E. Hart and S. Istrail, "HP Benchmarks," http://www.cs. sandia.gov/tech_reports/compbio tortilla-hp-benchmarks.html, Aug. 2005.
- [42] M.T. Hoque, M. Chetty, and L.S. Dooley, "A New Guided Genetic Algorithm for 2D Hydrophobic-Hydrophilic Model to Predict Protein Folding,"
Proc. IEEE Congress Evolutionary Computation (CEC), pp. 259-266, 2005.- [43] D.W. Corne and G.B. Fogel, "An Introduction to Bioinformatics for Computer Scientists,"
Evolutionary Computation in Bioinformatics, G.B. Fogel and D.W. Corne, eds., pp. 3-18, Morgan Kaufmann, 2004.- [44] M.T. Hoque, M. Chetty, and L.S. Dooley, "Non-Isomorphic Coding in Lattice Model and Its Impact for Protein Folding Prediction Using Genetic Algorithm,"
Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2006.- [45] M. Chen and K.Y. Lin, "Universal Amplitude Ratios for Three-Dimensional Self-Avoiding Walks,"
J. Physics A: Math. and General, vol. 35, pp. 1501-1508, 2002.- [46] R. Schiemann, M. Bachmann, and W. Janke, "Exact Enumeration of Three‐Dimensional Lattice Proteins,"
Computer Physics Comm., vol. 166, pp. 8-16, 2005.- [47] D. MacDonald, S. Joseph, D.L. Hunter, L.L. Moseley, N. Jan, and A.J. Guttmann, "Self-Avoiding Walks on the Simple Cubic Lattice,"
J. Physics A: Math. and General, vol. 33, pp. 5973-5983, 2000.- [48] J. Guttmann, "Self-Avoiding Walks in Constrained and Random Geometries,"
Statistics of Linear Polymers in Disordered Media, B.K. Chakrabarti, ed., pp. 59-101, Elsevier, 2005.- [49] P. Crescenzi, D. Goldman, C. Papadimitriou, A. Piccolboni, and M. Yannakakis, "On the Complexity of Protein Folding (Extended Abstract),"
Proc. Second Ann. Int'l Conf. Computational Molecular Biology, pp. 597-603, 1998.- [50] B. Berger and T. Leighton, "Protein Folding in the Hydrophobic-Hydrophilic (HP) Model Is NP-Complete,"
J. Computational Biology, vol. 5, pp. 27-40, 1998.- [51] L. Toma and S. Toma, "Contact Interactions Methods: A New Algorithm for Protein Folding Simulations,"
Protein Science, vol. 5, pp. 147-153, 1996.- [52] E. Bornberg-Bauer, "Chain Growth Algorithms for HP-Type Lattice Proteins,"
Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 47-55, 1997.- [53] J. Vanhala and K. Kaski, "Protein Structure Prediction System Based on Artificial Neural Networks,"
Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 402-410, 1993.- [54] F. Markowetz, L. Edler, and M. Vingron, "Support Vector Machines for Protein Fold Class Prediction,"
Biometrical J., vol. 45, pp. 377-389, 2003.- [55] A. Raval, Z. Ghahramani, and D.L. Wild, "A Bayesian Network Model for Protein Fold and Remote Homologue Recognition,"
Bioinformatics, vol. 18, pp. 788-801, 2002.- [56] P. Baldi and S. Brunak,
Bioinformatics: The Machine Learning Approach. The MIT Press, 2001.- [57] B. Rost, "Review: Protein Secondary Structure Prediction Continues to Rise,"
J. Structural Biology, vol. 134, pp. 204-218, 2001.- [58] W.E. Hart and S. Istrail, "Fast Protein Folding in the Hydrophobic-Hydrophilic Model within Three-Eights of Optimal,"
Proc. 27th Ann. ACM Symp. Theory of Computing, pp. 157-168, 1995.- [59] R.B. Lyngsø and C.N.S. Pedersen, "Protein Folding in the 2D HP Model," BRICS, 2000, http://www.brics.dk/RS/99/16BRICS-RS-99-16.pdf , last accessed Feb. 2009.
- [60] A. Newman, "A New Algorithm for Protein Folding in the HP Model,"
Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 876-884, 2002.- [61] D.G. Brown, "Bioinformatics Group," School of Computer Science, Univ. of Waterloo Canada, http:/monod.uwaterloo.ca/, Apr. 2007.
- [62] J. Meller and R. Elber, "Linear Programming Optimization and a Double Statistical Filter for Protein Threading Protocols,"
Proteins: Structure, Function, and Genetics, vol. 45, pp. 241-261, 2001.- [63] M.J. Panik,
Linear Programming: Mathematics, Theory and Algorithm. Springer, 1996.- [64] R. Carr, W.E. Hart, and A. Newman, "Bounding a Protein's Free Energy in Lattice Models via Linear Programming,"
Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), 2004.- [65] O. Takahashi, H. Kita, and S. Kobayashi, "Protein Folding by a Hierarchical Genetic Algorithm,"
Proc. Fourth Int'l Symp. Artificial Life and Robotics (AROB), 1999.- [66] R. König and T. Dandekar, "Refined Genetic Algorithm Simulations to Model Proteins,"
J. Molecular Modeling, vol. 5, pp. 317-324, 1999.- [67] S. Ronald, "Duplicate Genotypes in a Genetic Algorithm,"
Proc. IEEE World Congress Computational Intelligence, pp. 793-798, 1998.- [68] M.T. Hoque, M. Chetty, and L.S. Dooley, "Generalized Schemata Theorem Incorporating Twin Removal for Protein Structure Prediction,"
Pattern Recognition in Bioinformatics, pp. 84-97, Springer, 2007.- [69] Z. Michalewicz,
Genetic Algorithms $+$ Data Structures $=$ Evolution. Springer, 1992.- [70] D. Whitley, "An Overview of Evolutionary Algorithms,"
J. Information and Software Technology, vol. 43, pp. 817-831, 2001.- [71] R.L. Haupt and S.E. Haupt,
Practical Genetic Algorithms, second ed. Wiley-Interscience, 2004.- [72] M.T. Hoque, M. Chetty, and L.S. Dooley, "Critical Analysis of the Schemata Theorem: The Impact of Twins and the Effect in the Prediction of Protein Folding Using Lattice Model," Technical Report TR-2005/8, GSIT, MONASH Univ., 2005.
- [73] L. Altenberg, "The Schema Theorem and Price's Theorem,"
Foundations of Genetic Algorithms, vol. 3, Morgan Kaufmann, 1995.- [74] D.B. Fogel and A. Ghozeil, "Schema Processing, Proportional Selection, and the Misallocation of Trials in Genetic Algorithms,"
Information Science, vol. 122, pp. 93-119, 2000.- [75] K. Deb and D.E. Goldberg, "An Investigation of Niche and Species Formation in Genetic Function Optimization,"
Proc. Third Int'l Conf. Genetic Algorithms, pp. 42-50, 1989.- [76] W.M. Spears, "Simple Subpopulation Schemes,"
Proc. Evolutionary Programming Conf., pp. 296-307, 1994.- [77] J. Skolnick and A. Kolinski, "Computational Studies of Protein Folding,"
IEEE Computing in Science and Eng., vol. 3, no. 5, pp. 40-50, Sept./Oct. 2001.- [78] L.J. Eshelman and J.D. Schaffer, "Preventing Premature Convergence in Genetic Algorithms by Preventing Incast,"
Proc. Fourth Int'l Conf. Genetic Algorithms, pp. 115-122, 1991.- [79] C. Poloni and V. Pediroda, "GA Coupled with Computationally Expensive Simulations: Tools to Improve Efficiency,"
Genetic Algorithms and Evolution Strategies in Engineering and Computer Science: Recent Advances and Industrial Applications, pp. 267-288, John Wiley & Sons, 1995.- [80] M. Mitchell,
An Introduction to Genetic Algorithms. MIT Press, 1996.- [81] K. Deb and S. Agrawal, "A Niched-Penalty Approach for Constraint Handling in Genetic Algorithms,"
Artificial Neural Nets and Genetic Algorithms, pp. 235-243, Springer, 1999.- [82] R. Backofen and S. Will, "A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models,"
Constraints, vol. 11, pp. 5-30, 2006.- [83] C.A.C. Coello, "An Updated Survey of GA-Based Multiobjective Optimization Techniques,"
ACM Computing Surveys, vol. 32, pp. 109-143, 2000.- [84] J.G. Digalakis and K.G. Margaritis, "An Experimental Study of Benchmarking Functions for Genetic Algorithms,"
Int'l J. Computer Math., vol. 79, pp. 403-416, 2002.- [85] PDB, "Protein Data Base," http://www.rcsb.orgpdb/, Feb. 2009.
- [86] D.E. Goldberg,
Genetic Algorithm Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, 1989.- [87] L. Davis,
Handbook of Genetic Algorithm. VNR, 1991.- [88] Y.Z. Guo, E.-M. Feng, and Y. Wang, "Exploration of Two-Dimensional Hydrophobic-Polar Lattice Model by Combining Local Search with Elastic Net Algorithm,"
J. Chemical Physics, vol. 125, pp. 1-6, 2006.- [89] A. Shmygelska, R. Aguirre-Hernández, and H.H. Hoos, "An Ant Colony Optimization Algorithm for the 2D HP Protein Folding Problem,"
Lecture Notes in Computer Science, pp. 40-52, Springer, 2002.- [90] J. Lee, "Conformational Space Annealing and a Lattice Model Protein,"
J. Korean Physical Soc., vol. 45, pp. 1450-1454, 2004.- [91] J. Lee, H.A. Scheraga, and S. Rackovsky, "New Optimization Method for Conformational Energy Calculations on Polypeptides: Conformational Space Annealing,"
J. Computational Chemistry, vol. 18, pp. 1222-1232, 1997.- [92] Yiliu, "Rosetta 2.1.0," 2007-2008 The Rosetta Commons, http://www.rosettacommons.org/tikitiki-index.php?page=Change+ Log , last accessed Feb. 2009.
- [93] R. Bonneau, J. Tsai, I. Ruczinski, D. Chivian, C. Rohl, C.E.M. Strauss, and D. Baker, "Rosetta in CASP4: Progress in Ab Initio Protein Structure Prediction,"
Proteins: Structure, Function, and Genetics, vol. 5, pp. 119-126, 2001.- [94] P. Bradley, D. Chivian, J. Meiler, K.M.S. Misura, C.A. Rohl, W.R. Schief, W.J. Wedemeyer, O. Scueler-Furman, P. Murphy, J. Schonbrun, C.E.M. Strauss, and D. Baker, "Rosetta Predictions in CASP5: Success, Failure, and Prospects for Complete Automation,"
Proteins: Structure, Function, and Genetics, vol. 53, pp. 457-468, 2003.- [95] K.T. Simons, R. Bonneau, I. Ruczinski, and D. Baker, "Ab Initio Protein Structure Prediction of CASP III Target Using ROSETTA,"
Proteins: Structure, Function, and Genetics, vol. 3, pp. 171-176, 1999.- [96] D. Baker, "Prediction and Design of Macromolecular Structures and Interactions,"
Philosophical Trans. Royal. Soc. B, vol. 361, pp. 459-463, 2006.- [97] M.T. Hoque, M. Chetty, and A. Sattar, "Protein Folding Prediction in 3D FCC HP Lattice Model Using Genetic Algorithm,"
Proc. IEEE Congress Evolutionary Computation (CEC), Bioinformatics Special Session, pp. 4138-4145, 2007.- [98] M.T. Hoque, M. Chetty, and L.S. Dooley, "Efficient Computation of Fitness Function by Pruning in Hydrophobic-Hydrophilic Model,"
Proc. Sixth Int'l Symp. Biological and Medical Data Analysis (ISBMDA), pp. 346-354, 2005. |