Issue No. 01 - January-February (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.71
Dario Gasbarra , University of Helsinki, Helsinki
Sangita Kulathinal , University of Helsinki, Helsinki and Indic Society for Education and Development, Nashik
Matti Pirinen , University of Helsinki, Helsinki
Mikko J. Sillanpää , University of Helsinki, Helsinki
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
DNA pools, haplotype frequency estimation, HapMap database, multinomial distribution.
Dario Gasbarra, Sangita Kulathinal, Matti Pirinen, Mikko J. Sillanpää, "Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 36-44, January-February 2011, doi:10.1109/TCBB.2009.71