Issue No. 04 - April (2011 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.101
Wuchun Feng , Virginia Tech, Blacksburg
Xiaosong Ma , North Carolina State University and Oak Ridge National Laboratory, Raleigh
Nagiza F. Samatova , North Carolina State University and Oak Ridge National Laboratory, Raleigh
Heshan Lin , Virginia Tech, Blacksburg
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.
Scheduling, parallel I/O, bioinformatics, parallel genomic sequence search, BLAST.
Wuchun Feng, Xiaosong Ma, Nagiza F. Samatova, Heshan Lin, "Coordinating Computation and I/O in Massively Parallel Sequence Search", IEEE Transactions on Parallel & Distributed Systems, vol. 22, no. , pp. 529-543, April 2011, doi:10.1109/TPDS.2010.101