2009 International Conference on Parallel Processing (2009)
Sept. 22, 2009 to Sept. 25, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2009.29
Recent advances in DNA sequencing techniques have led to an unprecedented accumulation and availability of molecular sequence data that needs to be analyzed. This data explosion in combination with the multi-core revolution also affects the computational kernels for phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data) under the widely-used Maximum Likelihood (ML) model. At present, analyses of so called multi-gene or phylogenomic alignments, i.e., input data sets that comprise concatenated sequence data of several genes, are becoming increasingly popular. Usually such multi-gene analyses are partitioned, i.e., a separate set of likelihood model parameters is estimated for each gene/partition. While the phylogenetic likelihood function exhibits intrinsic fine-grained parallelism, the parallel computation of the likelihood function in such partitioned multigene analyses can lead to significant load-balance problems. Here, we describe these problems for the first time, discuss the implications on the design of "classic" ML-based as well as Bayesian search algorithms, and provide an initial solution that yields up to eight-fold improvements in speedup values on AMD Barcelona and Sun x4600 16-core systems for realistic application scenarios.
PKL, Phylogenetic Likelihood Function, Parallelization, Load Imbalance
A. Stamatakis and M. Ott, "Load Balance in the Phylogenetic Likelihood Kernel," 2009 International Conference on Parallel Processing(ICPP), Vienna, Austria, 2009, pp. 348-355.