Issue No. 04 - April (2011 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.101
Heshan Lin , Virginia Tech, Blacksburg
Xiaosong Ma , North Carolina State University and Oak Ridge National Laboratory, Raleigh
Wuchun Feng , Virginia Tech, Blacksburg
Nagiza F. Samatova , North Carolina State University and Oak Ridge National Laboratory, Raleigh
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.
Scheduling, parallel I/O, bioinformatics, parallel genomic sequence search, BLAST.
W. Feng, X. Ma, N. F. Samatova and H. Lin, "Coordinating Computation and I/O in Massively Parallel Sequence Search," in IEEE Transactions on Parallel & Distributed Systems, vol. 22, no. , pp. 529-543, 2010.