This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids
Dec. 2012 (vol. 23 no. 12)
pp. 2189-2197
Christopher Moretti, Princeton University, Princeton
Andrew Thrasher, University of Notre Dame, Notre Dame
Li Yu, University of Notre Dame, Notre Dame
Michael Olson, University of Notre Dame, Notre Dame
Scott Emrich, University of Notre Dame, Notre Dame
Douglas Thain, University of Notre Dame, Notre Dame
Bioinformatics researchers need efficient means to process large collections of genomic sequence data. One application of interest, genome assembly, has great potential for parallelization; however, most previous attempts at parallelization require uncommon high-end hardware. This paper introduces the Scalable Assembler at Notre Dame (SAND) framework that can achieve significant speedup using large numbers of commodity machines harnessed from clusters, clouds, and grids. SAND interfaces with the Celera open-source assembly toolkit, replacing two independent sequential modules with scalable parallel alternatives: the candidate selector exploits distributed memory capacity, and the sequence aligner exploits distributed computing capacity. For large problems, these modules provide robust task and data management while also achieving speedup with high efficiency. We show results for several data sets ranging from 738 thousand to over 320 million alignments using resources ranging from a small cluster to more than a thousand nodes spanning three institutions.
Index Terms:
Bioinformatics,Genomics,Cloud computing,Distributed processing,Random access memory,Biomedical informatics,genome assembly,Distributed systems,bioinformatics
Citation:
Christopher Moretti, Andrew Thrasher, Li Yu, Michael Olson, Scott Emrich, Douglas Thain, "A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 12, pp. 2189-2197, Dec. 2012, doi:10.1109/TPDS.2012.80
Usage of this product signifies your acceptance of the Terms of Use.