Issue No. 04 - October-December (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.52
Paola Bonizzoni , Università Degli Studi di Milano-Bicocca, Milano
Gianluca Della Vedova , Università Degli Studi di Milano-Bicocca, Milano
Riccardo Dondi , Università degli Studi di Bergamo, Bergamo
Yuri Pirola , Università Degli Studi di Milano-Bicocca, Milano
Romeo Rizzi , Università degli Studi di Udine, Udine
The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper, we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given single nucleotide polymorphisms (SNP). Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project.
Genetics, Biological cells, Phylogeny, Polynomials, Inference algorithms, Organisms, Approximation algorithms, Biological system modeling, Biology computing, Data mining
P. Bonizzoni, G. Della Vedova, R. Dondi, Y. Pirola and R. Rizzi, "Pure Parsimony Xor Haplotyping," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 4, pp. 598-610, 2010.