Issue No.04 - April (2011 vol.22)
pp: 529-543
Xiaosong Ma , North Carolina State University and Oak Ridge National Laboratory, Raleigh
Wuchun Feng , Virginia Tech, Blacksburg
Heshan Lin , Virginia Tech, Blacksburg
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and high-performance noncontiguous I/O.
Scheduling, parallel I/O, bioinformatics, parallel genomic sequence search, BLAST.
Xiaosong Ma, Wuchun Feng, Heshan Lin, "Coordinating Computation and I/O in Massively Parallel Sequence Search", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 4, pp. 529-543, April 2011, doi:10.1109/TPDS.2010.101
[1] D. Benson, I. Karsch-Mizrachi, D. Lipman, J. Ostell, and D. Wheeler, "GenBank," Nucleic Acids Research, vol. 30, no. 1, pp. 17-20, Jan. 2008.
[2] J. Ostell, "Databases of Discovery," ACM Queue, vol. 3, no. 3, pp. 40-48, 2005.
[3] Nat'l Research Council, The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Nat'l Academy of Sciences, 2007.
[4] S. Schwartz, J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. Hardison, D. Haussler, and W. Miller, "Human-Mouse Alignments with BLASTZ," Genome Res., vol. 13, pp. 103-107, 2003.
[5] M. Gardner, W. Feng, J. Archuleta, H. Lin, and X. Ma, "Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications," Proc. ACM/IEEE SC2006 Conf. High Performance Networking and Computing, 2006.
[6] A. Ching, W. Feng, H. Lin, X. Ma, and A. Choudhary, "Exploring I/O Strategies for Parallel Sequence Database Search Tools with S3aSim," Proc. Int'l Symp. High Performance Distributed Computing, June 2006.
[7] S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[8] S. Altschul, T. Madden, A. Schffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[9] M. Warren and J. Salmon, "A Parallel Hashed Oct-Tree N-Body Algorithm," Proc. ACM/IEEE Conf. Supercomputing, 1993.
[10] J. Chen and V. Taylor, "Mesh Partitioning for Distributed Systems: Exploring Optimal Number of Partitions with Local and Remote Communication," Proc. SIAM Conf. Parallel Processing for Scientific Computing (PPSC), 1999.
[11] K. Schloegel, G. Karypis, and V. Kumar, "Dynamic Repartitioning of Adaptively Refined Meshes," Proc. ACM/IEEE Conf. Supercomputing, 1998.
[12] A. Sohn and H. Simon, "S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-Based Computations," Proc. Supercomputing (SC '98), , 1998.
[13] S. Hummel, E. Schonberg, and L. Flynn, "Factoring: A Method for Scheduling Parallel Loops," Comm. ACM, vol. 35, no. 8, pp. 90-101, 1992.
[14] S. Hummel, J. Schmidt, R. Uma, and J. Wein, "Load-Sharing in Heterogeneous Systems via Weighted Factoring," Proc. Eighth Ann. ACM Symp. Parallel Algorithms and Architectures (SPAA), 1996.
[15] I. Banicescu and S. Hummel, "Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations," Proc. ACM/IEEE Conf. Supercomputing, 1995.
[16] I. Banicescu and V. Velusamy, "Load Balancing Highly Irregular Computations with the Adaptive Factoring," Proc. 16th IEEE CS Int'l Parallel and Distributed Processing Symp. (IPDPS '02), p. 195, 2002.
[17] I. Banicescu, V. Velusamy, and J. Devaprasad, "On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring," Cluster Computing, vol. 6, no. 3, pp. 215-226, 2003.
[18] R. Thakur and A. Choudhary, "An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays," Scientific Programming, vol. 5, no. 4, pp. 301-317, 1996.
[19] R. Thakur, W. Gropp, and E. Lusk, "On Implementing MPI-IO Portably and with High Performance," Proc. Sixth Workshop I/O in Parallel and Distributed Systems, May 1999.
[20] R. Thakur, W. Gropp, and E. Lusk, "Optimizing Noncontiguous Accesses in MPI-IO," Parallel Computing, vol. 28, no. 1, pp. 83-105, Jan. 2002.
[21] A. Ching, A. Choudhary, W. Keng Liao, R. Ross, and W. Gropp, "Noncontiguous I/O through PVFS," Proc. IEEE CS Int'l Conf. Cluster Computing (CLUSTER '02), 2002.
[22] A. Ching, A. Choudhary, K. Coloma, W. Keng Liao, R. Ross, and W. Gropp, "Noncontiguous I/O Accesses through MPI-IO," Proc. Third IEEE CS Int'l Symp. Cluster Computing and the Grid (CCGRID), 2003.
[23] F. Isaila and W. Tichy, "View I/O: Improving the Performance of Non-Contiguous I/O," Proc. IEEE Int'l Conf. Cluster Computing, Dec. 2003.
[24] A. Darling, L. Carey, and W. Feng, "The Design, Implementation, and Evaluation of mpiBLAST," Proc. ClusterWorld Conf. and Expo, in conjunction with the Fourth Int'l Conf. Linux Clusters: the HPC Revolution, 2003.
[25] T. Smith and M. Waterman, "Identification of Common Molecular Subsequences," J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[26] S. Needleman and C. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins," J. Molecular Biology, vol. 48, no. 3, pp. 443-453, 1970.
[27] D. Lipman and W. Pearson, "Improved Tools for Biological Sequence Comparison," Proc. Nat'l Acad. Sci., vol. 85, no. 8, pp. 2444-2448, 1988.
[28] R. Luthy and C. Hoover, "Hardware and Software Systems for Accelerating Common Bioinformatics Sequence Analysis Algorithms," Biosilico, vol. 2, no. 1, pp. 12-17, 2004.
[29] C. White, R. Singh, P. Reintjes, J. Lampe, B. Erickson, W. Dettloff, V. Chi, and S. Altschul, "BioSCAN: A VLSI-Based System for Biosequence Analysis," Proc. IEEE CS Int'l Conf. Computer Design on VLSI in Computer and Processors (ICCD), 1991.
[30] "Bioccerator," http:/, Compugen Ltd., 1994.
[31] R. Braun, K. Pedretti, T. Casavant, T. Scheetz, C. Birkett, and C. Roberts, "Parallelization of Local Blast Service on Workstation Clusters," Future Generation Computer Systems, vol. 17, no. 6, pp. 745-754, 2001.
[32] N. Camp, H. Cofer, and R. Gomperts, "High-Throughput BLAST," resources/papers/HTBlastHT_Whitepaper.html , 2010.
[33] E. Chi, E. Shoop, J. Carlis, E. Retzel, and J. Riedl, "Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm," Technical Report TR97-005, Univ. of Minnesota, Computer Science Dept., 1997.
[34] R. Bjornson, A. Sherman, S. Weston, N. Willard, and J. Wing, "TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub," Proc. Int'l Parallel and Distributed Processing Symp., 2002.
[35] D. Mathog, "Parallel BLAST on Split Databases," Bioinformatics, vol. 19, no. 14, pp. 1865-1866, 2003.
[36] H. Lin, X. Ma, P. Chandramohan, A. Geist, and N. Samatova, "Efficient Data Access for Parallel BLAST," Proc. 19th IEEE CS Int'l Parallel and Distributed Processing Symp. (IPDPS '05), 2005.
[37] H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng, "Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture," Proc. Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC '08), 2008.
[38] H. Rangwala, E. Lantz, R. Musselman, K. Pinnow, B. Smith, and B. Wallenfelt, "Massively Parallel BLAST for the Blue Gene/L," Proc. High Availability and Performance Workshop, 2005.
[39] C. Oehmen and J. Nieplocha, "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis," IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 8, pp. 740-749, Aug. 2006.
[40] J. Nieplocha, R. Harrison, and R. Littlefield, "Global Arrays: A Nonuniform Memory Access Programming Model for High-Performance Computers," The J. Supercomputing, vol. 10, no. 2, pp. 169-189, 1996.
[41] O. Thorsen, K. Jian, A. Peters, B. Smith, H. Lin, W. Feng, and C. Sosa, "Parallel Genomic Sequence-Search on a Massively Parallel System," Proc. ACM Int'l Conf. Computing Frontiers, 2007.
[42] J. Dean and S. Ghemawat, "Mapreduce: Simplified Data Processing on Large Clusters," Comm. ACM, vol. 51, no. 1, pp. 107-113, 2008.
[43] C. Moretti, H. Bui, K. Hollingsworth, B. Rich, P. Flynn, and D. Thain, "All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 1, pp. 33-46, Jan. 2010.
[44] A. Matsunaga, M. Tsugawa, and J. Fortes, "Cloudblast: Combining Mapreduce and Virtualization on Distributed Resources for Bioinformatics Applications," Proc. Fourth IEEE CS Int'l Conf. eScience (ESCIENCE '08), pp. 222-229, 2008.
[45] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," ACM SIGOPS Operating Systems Rev., vol. 37, no. 5, pp. 29-43, 2003.
[46] R. Thakur, A. Choudhary, R. Bordawekar, S. More, and S. Kuditipudi, "Passion: Optimized I/O for Parallel Applications," Computer, vol. 29, no. 6, pp. 70-78, June 1996.
[47] J. May, Parallel I/O for High Performance Computing. Morgan Kaufmann Publishers, 2001.
[48] J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "Plfs: A Checkpoint Filesystem for Parallel Applications," Proc. Conf. High Performance Computing Networking, Storage and Analysis, pp. 1-12, 2009.
[49] MPI-2: Extensions to the Message-Passing Standard, Message Passing Interface Forum, July 1997.
[50] R. Thakur, W. Gropp, and E. Lusk, "Data Sieving and Collective I/O in ROMIO," Proc. Seventh Symp. Frontiers of Massively Parallel Computation, Feb. 1999.
[51] F. Schmuck and R. Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters," Proc. First Conf. File and Storage Technologies, 2002.
[52] "ZFS at," community zfs/, 2010.
[53] K. Jiang, O. Thorsen, A. Peters, B. Smith, and C.P. Sosa, "An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System," IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 1, pp. 15-23, Jan. 2007.
[54] C. Wu and A. Kalyanaraman, "An Efficient Parallel Approach for Identifying Protein Families in Large-Scale Metagenomic Data Sets," Proc. ACM/IEEE Conf. Supercomputing, pp. 1-10, 2008.