The Community for Technology Leaders
RSS Icon
Issue No.05 - September/October (2011 vol.8)
pp: 1183-1195
Ekhine Irurozki , University of the Basque Country, Donostia
Borja Calvo , University of the Basque Country, Donostia
Jose A. Lozano , University of the Basque Country, Donostia
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved.
Biology and genetics, haplotype inference, optimization.
Ekhine Irurozki, Borja Calvo, Jose A. Lozano, "A Preprocessing Procedure for Haplotype Inference by Pure Parsimony", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1183-1195, September/October 2011, doi:10.1109/TCBB.2010.125
[1] P. Bonizzoni, G. Vedova, R. Dondi, and J. Li, “The Haplotyping Problem: An Overview of Computational Models and Solutions,” J. Computer Science and Technology, vol. 18, pp. 675-688, Nov. 2003.
[2] D. Gusfield, “An Overview of Combinatorial Methods for Haplotype Inference,” Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 599-600, 2004.
[3] G. Lancia, M.C. Pinotti, and R. Rizzi, “Haplotyping Populations by Pure Parsimony: Complexity, Exact and Approximation Algorithms,” INFORMS J. Computing, vol. 16, pp. 348-359, 2004.
[4] D. Gusfield, “Haplotype Inference by Pure Parsimony,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.
[5] D. Gusfield, “A New Statistical Method for Haplotype Reconstruction from Population Data,” Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[6] R. Sharan, B.V. Halldorsson, and S. Istrail, “Islands of Tractability for Parsimony Haplotyping,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 303-311, July-Sept. 2006.
[7] S. Climer, A.R. Templeton, and W. Zhang, “How Frugal is Mother Nature with Haplotypes?” Bioinformatics, vol. 25, pp. 68-74, 2009.
[8] D. Catanzaro, A. Godi, and M. Labbe, “A Class Representative Model for Pure Parsimony Haplotyping,” INFORMS J. Computing, vol. 22, pp. 195-209, 2010.
[9] E. Erdem and F. Türe, “Efficient Haplotype Inference with Answer Set Programming,” Proc. 23rd Nat'l Conf. Artificial Intelligence (AAAI '08), pp. 436-441, 2008.
[10] J. Neigenfind, G. Gyetvai, R. Basekow, S. Diehl, U. Achenbach, C. Gebhardt, J. Selbig, and B. Kersten, “Haplotype Inference from Unphased SNP Data in Heterozygous Polyploids Based on SAT,” BMC Genomics, vol. 9, no. 1, pp. 356-382, 2008.
[11] D.G. Brown and I.M. Harrower, “Integer Programming Approaches to Haplotype Inference by Pure Parsimony,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 141-154, Apr.-June 2006.
[12] B. Halldorsson, V. Vafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,” Proc. DIMACS/RECOMB Satellite Workshop Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2004.
[13] L. Wang and Y. Xu, “Haplotype Inference by Maximum Parsimony,” Bioinformatics, vol. 19, pp. 1773-1780, 2003.
[14] R.-S. Wang, X.-S. Zhang, and L. Sheng, “Haplotype Inference by Pure Parsimony via Genetic Algorithm,” Proc. Int'l Symp. Operations Research and Its Applications, pp. 171-184, 2005.
[15] L. Di Gaspero and A. Roli, “Stochastic Local Search for Large-Scale Instances of the Haplotype Inference Problem by Pure Parsimony,” J. Algorithms, vol. 63, nos. 1-3, pp. 55-69, 2008.
[16] I. Lynce, A. Marques-Silva, and Jo, “Efficient Haplotype Inference with Boolean Satisfiability,” Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI '06), pp. 104-109, 2006.
[17] A. Graça, J. Marques-Silva, I. Lynce, and A.L. Oliveira, “Efficient Haplotype Inference with Pseudo-Boolean Optimization,” Proc. Second Int'l Conf. Algebraic Biology, pp. 125-139, 2007.
[18] A. Graça, J. Marques-Silva, I. Lynce, and A.L. Oliveira, “Efficient Haplotype Inference with Combined CP and OR Techniques,” Proc. Fifth Int'l Conf. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 308-312, 2008.
[19] The Int'l HapMap Consortium, “A Second Generation Human Haplotype Map of over 3.1 Million SNPs,” Nature, vol. 449, no. 7164, pp. 851-861, 2007.
[20] J. He and A. Zelikovsky, “Linear Reduction for Haplotype Inference,” Proc. Fourth Int'l Workshop Algorithms in Bioinformatics, pp. 242-253, 2004.
[21] K. Kalpakis and P. Namjoshi, “Haplotype Phasing Using Semidefinite Programming,” Proc. Fifth IEEE Symp. Bioinformatics and Bioeng. (BIBE), pp. 145-152, 2005.
[22] R.R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,” Bioinformatics, vol. 18, no. 2, pp. 337-338, 2002.
[23] V. Bafna, D. Gusfield, S. Hannenhalli, and S. Yooseph, “A Note on Efficient Computation of Haplotypes via Perfect Phylogeny,” J. Computational Biology, vol. 11, no. 5, pp. 858-866, 2004.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool