Subscribe

Issue No.04 - April (2011 vol.22)

pp: 529-543

Heshan Lin , Virginia Tech, Blacksburg

Xiaosong Ma , North Carolina State University and Oak Ridge National Laboratory, Raleigh

Wuchun Feng , Virginia Tech, Blacksburg

Nagiza F. Samatova , North Carolina State University and Oak Ridge National Laboratory, Raleigh

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.101

ABSTRACT

With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.

INDEX TERMS

Scheduling, parallel I/O, bioinformatics, parallel genomic sequence search, BLAST.

CITATION

Heshan Lin, Xiaosong Ma, Wuchun Feng, Nagiza F. Samatova, "Coordinating Computation and I/O in Massively Parallel Sequence Search",

*IEEE Transactions on Parallel & Distributed Systems*, vol.22, no. 4, pp. 529-543, April 2011, doi:10.1109/TPDS.2010.101REFERENCES

- [1] D. Benson, I. Karsch-Mizrachi, D. Lipman, J. Ostell, and D. Wheeler, "GenBank,"
Nucleic Acids Research, vol. 30, no. 1, pp. 17-20, Jan. 2008.- [2] J. Ostell, "Databases of Discovery,"
ACM Queue, vol. 3, no. 3, pp. 40-48, 2005.- [3]
Nat'l Research Council, The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Nat'l Academy of Sciences, 2007.- [4] S. Schwartz, J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. Hardison, D. Haussler, and W. Miller, "Human-Mouse Alignments with BLASTZ,"
Genome Res., vol. 13, pp. 103-107, 2003.- [5] M. Gardner, W. Feng, J. Archuleta, H. Lin, and X. Ma, "Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications,"
Proc. ACM/IEEE SC2006 Conf. High Performance Networking and Computing, 2006.- [6] A. Ching, W. Feng, H. Lin, X. Ma, and A. Choudhary, "Exploring I/O Strategies for Parallel Sequence Database Search Tools with S3aSim,"
Proc. Int'l Symp. High Performance Distributed Computing, June 2006.- [7] S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, "Basic Local Alignment Search Tool,"
J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.- [8] S. Altschul, T. Madden, A. Schffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,"
Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.- [9] M. Warren and J. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm,"
Proc. ACM/IEEE Conf. Supercomputing, 1993.- [10] J. Chen and V. Taylor, "Mesh Partitioning for Distributed Systems: Exploring Optimal Number of Partitions with Local and Remote Communication,"
Proc. SIAM Conf. Parallel Processing for Scientific Computing (PPSC), 1999.- [11] K. Schloegel, G. Karypis, and V. Kumar, "Dynamic Repartitioning of Adaptively Refined Meshes,"
Proc. ACM/IEEE Conf. Supercomputing, 1998.- [12] A. Sohn and H. Simon, "S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-Based Computations,"
Proc. Supercomputing (SC '98), citeseer.ist.psu.edu/articlesohn98sharp.html , 1998.- [13] S. Hummel, E. Schonberg, and L. Flynn, "Factoring: A Method for Scheduling Parallel Loops,"
Comm. ACM, vol. 35, no. 8, pp. 90-101, 1992.- [14] S. Hummel, J. Schmidt, R. Uma, and J. Wein, "Load-Sharing in Heterogeneous Systems via Weighted Factoring,"
Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA), 1996.- [15] I. Banicescu and S. Hummel, "Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations,"
Proc. ACM/IEEE Conf. Supercomputing, 1995.- [16] I. Banicescu and V. Velusamy, "Load Balancing Highly Irregular Computations with the Adaptive Factoring,"
Proc. 16th IEEE CS Int'l Parallel and Distributed Processing Symp. (IPDPS '02), p. 195, 2002.- [17] I. Banicescu, V. Velusamy, and J. Devaprasad, "On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring,"
Cluster Computing, vol. 6, no. 3, pp. 215-226, 2003.- [18] R. Thakur and A. Choudhary, "An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays,"
Scientific Programming, vol. 5, no. 4, pp. 301-317, 1996.- [19] R. Thakur, W. Gropp, and E. Lusk, "On Implementing MPI-IO Portably and with High Performance,"
Proc. Sixth Workshop I/O in Parallel and Distributed Systems, May 1999.- [20] R. Thakur, W. Gropp, and E. Lusk, "Optimizing Noncontiguous Accesses in MPI-IO,"
Parallel Computing, vol. 28, no. 1, pp. 83-105, Jan. 2002.- [21] A. Ching, A. Choudhary, W. Keng Liao, R. Ross, and W. Gropp, "Noncontiguous I/O through PVFS,"
Proc. IEEE CS Int'l Conf. Cluster Computing (CLUSTER '02), 2002.- [22] A. Ching, A. Choudhary, K. Coloma, W. Keng Liao, R. Ross, and W. Gropp, "Noncontiguous I/O Accesses through MPI-IO,"
Proc. Third IEEE CS Int'l Symp. Cluster Computing and the Grid (CCGRID), 2003.- [23] F. Isaila and W. Tichy, "View I/O: Improving the Performance of Non-Contiguous I/O,"
Proc. IEEE Int'l Conf. Cluster Computing, Dec. 2003.- [24] A. Darling, L. Carey, and W. Feng, "The Design, Implementation, and Evaluation of mpiBLAST,"
Proc. ClusterWorld Conf. and Expo, in conjunction with the Fourth Int'l Conf. Linux Clusters: the HPC Revolution, 2003.- [25] T. Smith and M. Waterman, "Identification of Common Molecular Subsequences,"
J. Molecular Biology, vol. 147, pp. 195-197, 1981.- [26] S. Needleman and C. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins,"
J. Molecular Biology, vol. 48, no. 3, pp. 443-453, 1970.- [27] D. Lipman and W. Pearson, "Improved Tools for Biological Sequence Comparison,"
Proc. Nat'l Acad. Sci., vol. 85, no. 8, pp. 2444-2448, 1988.- [28] R. Luthy and C. Hoover, "Hardware and Software Systems for Accelerating Common Bioinformatics Sequence Analysis Algorithms,"
Biosilico, vol. 2, no. 1, pp. 12-17, 2004.- [29] C. White, R. Singh, P. Reintjes, J. Lampe, B. Erickson, W. Dettloff, V. Chi, and S. Altschul, "BioSCAN: A VLSI-Based System for Biosequence Analysis,"
Proc. IEEE CS Int'l Conf. Computer Design on VLSI in Computer and Processors (ICCD), 1991.- [30] "Bioccerator," http:/eta.embl-heidelberg.de:8000/, Compugen Ltd., 1994.
- [31] R. Braun, K. Pedretti, T. Casavant, T. Scheetz, C. Birkett, and C. Roberts, "Parallelization of Local Blast Service on Workstation Clusters,"
Future Generation Computer Systems, vol. 17, no. 6, pp. 745-754, 2001.- [32] N. Camp, H. Cofer, and R. Gomperts, "High-Throughput BLAST," http://www.sgi.com/industries/sciences/chembio/ resources/papers/HTBlastHT_Whitepaper.html , 2010.
- [33] E. Chi, E. Shoop, J. Carlis, E. Retzel, and J. Riedl, "Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm," Technical Report TR97-005, Univ. of Minnesota, Computer Science Dept., 1997.
- [34] R. Bjornson, A. Sherman, S. Weston, N. Willard, and J. Wing, "TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub,"
Proc. Int'l Parallel and Distributed Processing Symp., 2002.- [35] D. Mathog, "Parallel BLAST on Split Databases,"
Bioinformatics, vol. 19, no. 14, pp. 1865-1866, 2003.- [36] H. Lin, X. Ma, P. Chandramohan, A. Geist, and N. Samatova, "Efficient Data Access for Parallel BLAST,"
Proc. 19th IEEE CS Int'l Parallel and Distributed Processing Symp. (IPDPS '05), 2005.- [37] H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng, "Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture,"
Proc. Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC '08), 2008.- [38] H. Rangwala, E. Lantz, R. Musselman, K. Pinnow, B. Smith, and B. Wallenfelt, "Massively Parallel BLAST for the Blue Gene/L,"
Proc. High Availability and Performance Workshop, 2005.- [39] C. Oehmen and J. Nieplocha, "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis,"
IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 8, pp. 740-749, Aug. 2006.- [40] J. Nieplocha, R. Harrison, and R. Littlefield, "Global Arrays: A Nonuniform Memory Access Programming Model for High-Performance Computers,"
The J. Supercomputing, vol. 10, no. 2, pp. 169-189, 1996.- [41] O. Thorsen, K. Jian, A. Peters, B. Smith, H. Lin, W. Feng, and C. Sosa, "Parallel Genomic Sequence-Search on a Massively Parallel System,"
Proc. ACM Int'l Conf. Computing Frontiers, 2007.- [42] J. Dean and S. Ghemawat, "Mapreduce: Simplified Data Processing on Large Clusters,"
Comm. ACM, vol. 51, no. 1, pp. 107-113, 2008.- [43] C. Moretti, H. Bui, K. Hollingsworth, B. Rich, P. Flynn, and D. Thain, "All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids,"
IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 1, pp. 33-46, Jan. 2010.- [44] A. Matsunaga, M. Tsugawa, and J. Fortes, "Cloudblast: Combining Mapreduce and Virtualization on Distributed Resources for Bioinformatics Applications,"
Proc. Fourth IEEE CS Int'l Conf. eScience (ESCIENCE '08), pp. 222-229, 2008.- [45] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System,"
ACM SIGOPS Operating Systems Rev., vol. 37, no. 5, pp. 29-43, 2003.- [46] R. Thakur, A. Choudhary, R. Bordawekar, S. More, and S. Kuditipudi, "Passion: Optimized I/O for Parallel Applications,"
Computer, vol. 29, no. 6, pp. 70-78, June 1996.- [47] J. May,
Parallel I/O for High Performance Computing. Morgan Kaufmann Publishers, 2001.- [48] J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "Plfs: A Checkpoint Filesystem for Parallel Applications,"
Proc. Conf. High Performance Computing Networking, Storage and Analysis, pp. 1-12, 2009.- [49]
MPI-2: Extensions to the Message-Passing Standard, Message Passing Interface Forum, July 1997.- [50] R. Thakur, W. Gropp, and E. Lusk, "Data Sieving and Collective I/O in ROMIO,"
Proc. Seventh Symp. Frontiers of Massively Parallel Computation, Feb. 1999.- [51] F. Schmuck and R. Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters,"
Proc. First Conf. File and Storage Technologies, 2002.- [52] "ZFS at OpenSolaris.org," http://www.opensolaris.org/os/ community zfs/, 2010.
- [53] K. Jiang, O. Thorsen, A. Peters, B. Smith, and C.P. Sosa, "An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System,"
IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 1, pp. 15-23, Jan. 2007.- [54] C. Wu and A. Kalyanaraman, "An Efficient Parallel Approach for Identifying Protein Families in Large-Scale Metagenomic Data Sets,"
Proc. ACM/IEEE Conf. Supercomputing, pp. 1-10, 2008. |