Pure Parsimony Xor Haplotyping
October-December 2010 (vol. 7 no. 4)
pp. 598-610
Paola Bonizzoni, Università Degli Studi di Milano-Bicocca, Milano
Gianluca Della Vedova, Università Degli Studi di Milano-Bicocca, Milano
Riccardo Dondi, Università degli Studi di Bergamo, Bergamo
Yuri Pirola, Università Degli Studi di Milano-Bicocca, Milano
Romeo Rizzi, Università degli Studi di Udine, Udine
The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies [1]. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper, we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given single nucleotide polymorphisms (SNP). Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project.

Index Terms:
Algorithms, haplotype resolution, pure parsimony, approximation algorithms, graph representation.
Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, Romeo Rizzi, "Pure Parsimony Xor Haplotyping," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 4, pp. 598-610, Oct.-Dec. 2010, doi:10.1109/TCBB.2010.52
