The Community for Technology Leaders
Green Image
Issue No. 05 - May (2013 vol. 24)
ISSN: 1045-9219
pp: 977-986
Henning Meyerhenke , Karlsruhe Institute of Technology (KIT), Karlsruhe
Pushkar R. Pande , Georgia Institute of Technology, Atlanta
Xing Liu , Georgia Institute of Technology, Atlanta
David A. Bader , Georgia Institute of Technology, Atlanta
ABSTRACT
The study of genomes has been revolutionized by sequencing machines that output many short overlapping substrings (called reads). The task of sequence assembly in practice is to reconstruct long contiguous genome subsequences from the reads. With Next Generation Sequencing (NGS) technologies, assembly software needs to be more accurate, faster, and more memory-efficient due to the problem complexity and the size of the data sets. In this paper, we develop parallel algorithms and compressed data structures to address several computational challenges of NGS assembly. We demonstrate how commonly available multicore architectures can be efficiently utilized for sequence assembly. In all stages (indexing input strings, string graph construction and simplification, extraction of contiguous subsequences) of our software Pasqual, we use shared-memory parallelism to speed up the assembly process. In our experiments with data of up to 6.8 billion base pairs, we demonstrate that Pasqual generally delivers the best tradeoff between speed, memory consumption, and solution quality. On synthetic and real data sets Pasqual scales well on our test machine with 40 CPU cores with increasing number of threads. Given enough cores, Pasqual is fastest in our comparison.
INDEX TERMS
Arrays, Bioinformatics, Assembly, Genomics, Parallel processing, Indexes, high-performance bioinformatics, Parallel algorithms, de novo sequence assembly, parallel suffix array construction, shared memory parallelism
CITATION
Henning Meyerhenke, Pushkar R. Pande, Xing Liu, David A. Bader, "PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly", IEEE Transactions on Parallel & Distributed Systems, vol. 24, no. , pp. 977-986, May 2013, doi:10.1109/TPDS.2012.190
188 ms
(Ver 3.3 (11022016))