This Article 
 Bibliographic References 
 Add to: 
Islands of Tractability for Parsimony Haplotyping
July-September 2006 (vol. 3 no. 3)
pp. 303-311
We study the parsimony approach to haplotype inference, which calls for finding a set of haplotypes of minimum cardinality that explains an input set of genotypes. We prove that the problem is APX-hard even in very restricted cases. On the positive side, we identify islands of tractability for the problem, by focusing on instances with specific structure of haplotype sharing among the input genotypes. We exploit the structure of those instance to give polynomial and constant-approximation algorithms to the problem. We also show that the general parsimony haplotyping problem is fixed parameter tractable.

[1] V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph, “Haplotyping as a Perfect Phylogeny. A Direct Approach,” J. Computational Biology, vol. 10, no. 3, pp. 323-340, 2003.
[2] H.L. Bodlaender, “A Linear Tme Algorithm for Finding Tree-Decompositions of Small Treewidth,” SIAM J. Computing, vol. 25, pp. 1305-1317, 1996.
[3] D.G. Brown and I.M. Harrower, “A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis,” Proc. Fourth Int'l Workshop Algorithms in Bioinformatics, pp. 254-265, 2004.
[4] A.G. Clark, “Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations,” Molecular Biology and Evolution, vol. 7, no. 2, pp. 111-122, 1990.
[5] L. Excoffier and M. Slatkin, “Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population,” Molecular Biology and Evolution, vol. 12, no. 5, pp. 921-927, 1995.
[6] D. Fallin and N.J. Schork, “Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data,” Am. J. Human Genetics, vol. 67, no. 4, pp. 947-59, 2000.
[7] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979.
[8] G. Greenspan and D. Geiger, “Model-Based Inference of Haplotype Block Variation,” J. Computational Biology, vol. 11, pp. 493-504, 2004.
[9] D. Gusfield, “A Practical Algorithm for Optimal Inference of Haplotypes from Diploid Populations,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 183-189, 2000.
[10] D. Gusfield, “Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms,” J. Computational Biology, vol. 8, no. 3, pp. 305-324, 2001.
[11] D. Gusfield, “Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (extended abstract),” Proc. Sixth Ann. Int'l Conf. Computational Molecular Biology, pp. 166-175, 2002.
[12] D. Gusfield, “Haplotyping by Pure Parsimony,” Proc. 14th Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.
[13] B.V. Halldórsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,” Proc. Computational Methods for SNPs and Haplotype Inference, pp. 26-47, 2004.
[14] E. Halperin and E. Eskin, “Haplotype Reconstruction from Genotype Data Using Imperfect Phylogeny,” Bioinformatics, vol. 20, pp. 1842-1849, 2004.
[15] J. Håstad, “Clique Is Hard to Approximate within $n^{1-\epsilon}$ ,” Acta Mathematica, vol. 182, pp. 105-142, 1999.
[16] M.E. Hawley and K.K. Kidd, “HAPLO: A Program Using the EM Algorithm to Estimate the Frequencies of Multi-Site Haplotypes,” J. Heredity, vol. 86, pp. 409-411, 1995.
[17] Y.-T. Huang, K.-M. Chao, and T. Chen, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” Proc. ACM Symp. Applied Computing, pp. 146-150, 2005.
[18] E. Hubbell, “Finding a Maximum Parsimony Solution to Haplotype Phase is NP-Hard,” personal communication, 2001.
[19] G. Lancia, M.C. Pinotti, and R. Rizzi, “Haplotyping Populations by Pure Parsimony. Complexity, Exact and Approximation Algorithms,” INFORMS J. Computing, vol. 16, pp. 348-359, 2004.
[20] H. Lin, Z.-F. Zhang, DQ.-F. Zhang, D.-B. Bu, and M. Li, “A Note on the Single Genotype Resolution Problem,” J. Computer Science and Technology, vol. 19, pp. 254-257, 2004.
[21] J.C. Long, R.C. Williams, and M. Urbanek, “An E-M Algorithm and Testing Strategy for Multiple-Locus Haplotypes,” Am. J. Human Genetics, vol. 56, no. 2, pp. 799-810, 1995.
[22] T. Niu, Z.S. Qin, X. Xu, and J.S. Liu, “Bayesian Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms,” Am. J. Human Genetics, vol. 70, pp. 157-169, 2002.
[23] N. Patil et al., “Blocks of Limited Haplotype Diversity Revealed by High Resolution Scanning of Human Chromosome 21,” Science, vol. 294, pp. 1719-1723, 2001.
[24] E. Petrank, “The Hardness of Approximations: Gap Location,” Computational Complexity, vol. 4, pp. 133-157, 1994.
[25] R. Sharan, B.V. Halldórsson, and S. Istrail, “Islands of Tractability for Parsimony Haplotyping,” Proc. IEEE Computational Systems Bioinformatics Conf., 2005.
[26] M. Stephens, N.J. Smith, and P. Donnelly, “A New Statistical Method for Haplotype Reconstruction from Population Data,” Am. J. Human Genetics, vol. 68, pp. 978-989, 2001.
[27] L. Wang and Y. Xu, “Haplotype Inference by Maximum Parsimony,” Bioinformatics, vol. 19, pp. 1773-1780, 2003.

Index Terms:
Biology and genetics, graph algorithms, analysis of algorithms and problem complexity.
Roded Sharan, Bjarni V. Halld?rsson, Sorin Istrail, "Islands of Tractability for Parsimony Haplotyping," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 303-311, July-Sept. 2006, doi:10.1109/TCBB.2006.40
Usage of this product signifies your acceptance of the Terms of Use.