CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.01 - January-February

Subscribe

Issue No.01 - January-February (2011 vol.8)

pp: 36-44

Dario Gasbarra , University of Helsinki, Helsinki

Sangita Kulathinal , University of Helsinki, Helsinki and Indic Society for Education and Development, Nashik

Matti Pirinen , University of Helsinki, Helsinki

Mikko J. Sillanpää , University of Helsinki, Helsinki

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2009.71

ABSTRACT

We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.

INDEX TERMS

DNA pools, haplotype frequency estimation, HapMap database, multinomial distribution.

CITATION

Dario Gasbarra, Sangita Kulathinal, Matti Pirinen, Mikko J. Sillanpää, "Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 1, pp. 36-44, January-February 2011, doi:10.1109/TCBB.2009.71REFERENCES