Issue No.02 - March-April (2013 vol.10)
pp: 481-493
Kevin W. DeRonne , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
George Karypis , Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.
Pareto optimization, Proteins, Amino acids, Vectors, Heuristic algorithms, Linear programming,pairwise sequence alignment, Pareto optimization, Proteins, Amino acids, Vectors, Heuristic algorithms, Linear programming, optimization, Pareto
Kevin W. DeRonne, George Karypis, "Pareto Optimal Pairwise Sequence Alignment", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 481-493, March-April 2013, doi:10.1109/TCBB.2013.2
[1] S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[2] R. Sadreyev and N. Grishin, "Compass: A Tool for Comparison of Multiple Protein Alignments with Assessment of Statistical Significance," J. Molecular Biology, vol. 326, no. 1, pp. 317-336, 2003.
[3] R.C. Edgar and K. Sjölander, "A Comparison of Scoring Functions for Protein Sequence Profile Alignment," Bioinformatics, vol. 20, pp. 1301-1308, 2004.
[4] J.D. Thompson, F. Plewniak, and O. Poch, "BAliBASE: A Benchmark Alignment Database for the Evaluation of Multiple Alignment Programs," Bioinformatics, vol. 15, no. 1, pp. 87-88, 1999.
[5] O. Gotoh, "An Improved Algorithm for Matching Biological Sequences," J. Molecular Biology, vol. 162, pp. 705-708, 1982.
[6] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge Univ. Press, 1997.
[7] A. Taneda, "Multi-Objective Pairwise RNA Sequence Alignment," Bioinformatics, vol. 26, no. 19, pp. 2383-2390, Oct. 2010.
[8] A. Taneda, "An Efficient Genetic Algorithm for Structural Rna Pairwise Alignment and Its Application to Non-Coding RNA Discovery in Yeast," BMC Bioinformatics, vol. 9, article 521, 2008.
[9] P. Seeluangsawat and P. Chongstitvatana, "A Multiple Objective Evolutionary Algorithm for Multiple Sequence Alignment," Proc. Conf. Genetic and Evolutionary Computation (GECCO '05), vol. 1, pp. 477, 2005.
[10] F.J.M. da Silva, J.M.S. Pérez, J.A.G. Pulido, and M.A.V. Rodríguez, "Parallel Niche Pareto Alineaga—An Evolutionary Multiobjective Approach on Multiple Sequence Alignment," J. Integrative Bioinformatics, vol. 8, no. 3, p. 174, 2011.
[11] M.A. Roytberg, M.N. Semionenkov, and O.I. Tabolina, "Pareto-Optimal Alignment of Biological Sequences," Biofizika, vol. 44, no. 4, pp. 581-594, 1999.
[12] D. Gusfield, K. Balasubramanian, and D. Naor, "Parametric Optimization of Sequence Alignment," Proc. Third Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 432-439, 1992.
[13] L. Paquete and J.P.O. Almeida, "Experiments with Bicriteria Sequence Alignment," Cutting-Edge Research Topics on Multiple Criteria Decision Making, Y. Shi, S. Wang, Y. Peng, J. Li, and Y. Zeng, eds., vol. 35, pp. 45-51, Springer, 2009.
[14] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II," IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182-197, Apr. 2002.
[15] C. Fonseca and P.J. Fleming, "An Overview of Evolutionary Algorithms in Multiobjective Optimization," Evolutionary Computation, vol. 3, no. 1, pp. 1-16, 1995.
[16] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, "Scalable Test Problems for Evolutionary Multiobjective Optimization," Evolutionary Multiobjective Optimization: Theoretical Advances and Applications, A. Abraham, L. Jain, R. Goldberg, L.C. Jain, and X. Wu, eds., pp. 105-145, Springer, 2005.
[17] S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, "A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA," IEEE Trans. Evolutionary Computation, vol. 12, no. 3, pp. 269-283, June 2008.
[18] Z. Cai and Y. Wang, "A Multiobjective Optimization-Based Evolutionary Algorithm for Constrained Optimization," IEEE Trans. Evolutionary Computation, vol. 10, no. 6, pp. 658-675, Dec. 2006.
[19] M. Ben Othman, A. Hamdi-Cherif, and G.A. Azim, "Genetic Algorithms and Scalar Product for Pairwise Sequence Alignment," Int'l J. Computers, vol. 2, no. 2, pp. 134-147, 2008.
[20] G. Garai and B. Chowdhury, "A Novel Genetic Approach for Optimized Biological Sequence Alignment," J. Biophysical Chemistry, vol. 3, pp. 201-205, 2012.
[21] C. Notredame, E.A. O'Brien, and D.G. Higgins, "RAGA: RNA Sequence Alignment by Genetic Algorithm," Nucleic Acids Research, vol. 25, no. 22, pp. 4570-4580, 1997.
[22] C. Zhang and A. Wong, "Toward Efficient Multiple Molecular Sequence Alignment: A System of Genetic Algorithm and Dynamic Programming," IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 27, no. 6, pp. 918 -932, Dec. 1997.
[23] A.L. Jaimes and C.A.C. Coello, "An Introduction to Multi-Objective Evolutionary Algorithms and Some of Their Potential Uses in Biology," Applications of Computational Intelligence in Biology, T.G. Smolinski, M.G. Milanova, and A.E. Hassanien, eds., vol. 122, pp. 79-102, Springer, 2008.
[24] M.T. Jensen, "Reducing the Run-Time Complexity of Multiobjective EAs: The NSGA-II and Other Algorithms," IEEE Trans. Evolutionary Computation, vol. 7, no. 5, pp. 503-515, Oct. 2003.
[25] H. Fang, Q. Wang, Y.-C. Tu, and M.F. Horstemeyer, "An Efficient Non-Dominated Sorting Method for Evolutionary Algorithms," Evolutionary Computation, vol. 16, no. 3, pp. 355-384, 2008.
[26] J.C. Ferreira, C.M. Fonseca, and A. Gaspar-Cunha, "Methodology to Select Solutions from the Pareto-Optimal Set: A Comparative Study," Proc. Ninth Ann. Conf. Genetic and Evolutionary Computation, pp. 789-796, 2007.
[27] I. Shindyalov and P.E. Bourne, "Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path," Protein Eng., vol. 11, pp. 739-747, 1998.
[28] A.S. Konagurthu, J.C. Whisstock, P.J. Stuckey, and A.M. Lesk, "Mustang: A Multiple Structural Alignment Algorithm," Proteins, vol. 64, no. 3, pp. 559-574, Aug. 2006.
[29] C. Kemena and C. Notredame, "Upcoming Challenges for Multiple Sequence Alignment Methods in the High-Throughput Era," Bioinformatics, vol. 25, no. 19, pp. 2455-2465, 2009.
[30] H.-N. Lin, C. Notredame, J.-M. Chang, T.-Y. Sung, and W.-L. Hsu, "Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words," PLoS ONE, vol. 6, no. 12, article e27872, 2011.
[31] A. Heger and L. Holm, "Picasso: Generating a Covering Set of Protein Family Profiles," Bioinformatics, vol. 17, no. 3, pp. 272-279, 2001.
[32] G. Karypis, "YASSPP: Better Kernels and Coding Schemes Lead to Improvements in SVM-Based Secondary Structure Prediction," Proteins: Structure, Function and Bioinformatics, vol. 64, no. 3, pp. 575-586, 2006.
[33] R.H.M. Cline and K. Karplus, "Predicting Reliable Regions in Protein Sequence Alignments," Bioinformatics, vol. 18, pp. 306-314, 2002.
[34] Y. Liu, B. Schmidt, and D.L. Maskell, "MSAProbs: Multiple Sequence Alignment Based on Pair Hidden Markov Models and Partition Function Posterior Prob Abilities," Bioinformatics, vol. 26, no. 16, pp. 1958-1964, 2010.