This Article 
 Bibliographic References 
 Add to: 
Streaming Algorithms for Biological Sequence Alignment on GPUs
September 2007 (vol. 18 no. 9)
pp. 1270-1281
Sequence alignment is a common and often repeated task in molecular biology. Typical alignment operations consist of finding similarities between a pair of sequences (pairwise sequence alignment) or a family of sequences (multiple sequence alignment). The need for speeding up this treatment comes from the rapid growth rate of biological sequence databases: every year their size increases by a factor 1.5 to 2. In this paper we present a new approach to high performance biological sequence alignment based on commodity PC graphics hardware. Using modern graphics processing units (GPUs) for high performance computing is facilitated by their enhanced programmability and motivated by their attractive price/performance ratio and incredible growth in speed. To derive an efficient mapping onto this type of architecture, we have reformulated dynamic programming based alignment algorithms as streaming algorithms in terms of computer graphics primitives. Our experimental results show that the GPU-based approach allows speedups of over one order of magnitude with respect to optimized CPU implementations.

[1] D. Bader, “Computational Biology and High-Performance Computing,” Comm. ACM, vol. 47, no. 11, pp. 34-41, 2004.
[2] S. Rajko and S. Aluru, “Space and Time Optimal Parallel Sequence Alignments,” IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 11, pp. 1070-1081, Nov. 2004.
[3] B. Schmidt, H. Schroder, and M. Schimmler, “Massively Parallel Solutions for Molecular Sequence Analysis,” Proc. First IEEE Int'l Workshop High Performance Computational Biology (HiCOMB '02), 2002.
[4] T. Rognes, “ParAlign: A Parallel Sequence Alignment Algorithm for Rapid and Sensitive Database Searches,” Nucleic Acids Research, vol. 29, no. 7, pp. 1647-1652, 2001.
[5] J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. Lefohn, and T. Purcell, “A Survey of General-Purpose Computation on Graphics Hardware,” Proc. Eurographics, pp. 21-51, 2005.
[6] J. Kessenich, D. Baldwin, and R. Rost, “The OpenGL Shading Language, Document Revision 59,” technical report, , 2005.
[7] Microsoft, “High-Level Shader Language,” technical report, library/en-us/directx9_c dx9_graphics_reference_hlsl.asp , 2006.
[8] W. Mark, R. Glanville, K. Akeley, and M. Kilgard, “Cg: A System for Programming Graphics Hardware in a C-Like Language,” ACM Trans. Graphics, vol. 22, pp. 896-907, 2003.
[9] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Mike, and H. Pat, “Brook for GPUs: Stream Computing on Graphics Hardware,” Proc. ACM SIGGRAPH, 2004.
[10] M. Mccool, Z. Qin, and T. Popa, “Shader Metaprogramming,” Proc. ACM SIGGRAPH/Eurographics Graphics Hardware Workshop, 2002.
[11] J. Kruger and R. Westermann, “Linear Algebra Operators for GPU Implementation of Numerical Algorithms,” ACM Trans. Graphics, vol. 22, pp. 908-916, 2003.
[12] P. Agarwal, S. Krishnan, N. Mustafa, and S. Venkatasubramanian, “Streaming Geometric Optimization Using Graphics Hardware,” Proc. 11th European Symp. Algorithms, 2003.
[13] N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha, “Fast Computation of Database Operations Using Graphics Processors,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 215-226, 2004.
[14] F. Xu and K. Mueller, “Ultra-Fast 3D Filtered Backprojection on Commodity Graphics Hardware,” Proc. IEEE Int'l Symp. Biomedical Imaging (ISBI '04), 2004.
[15] D. Horn, M. Houston, and P. Hanrahan, “ClawHMMer: A Streaming HMMer-Search Implementation,” Proc. ACM/IEEE Conf. Supercomputing (SC '05), 2005.
[16] W. Liu, B. Schmidt, G. Voss, A. Schröder, and W. Müller-Wittig, “Bio-Sequence Database Scanning on a GPU,” Proc. 20th IEEE Int'l Parallel and Distributed Processing Symp. (High Performance Computational Biology (HiCOMB) Workshop), 2006.
[17] W. Liu, B. Schmidt, G. Voss, and W. Müller-Wittig, “GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment,” Proc. 13th Ann. IEEE Int'l Conf. High Performance Computing (HiPC '06), pp. 363-374, 2006.
[18] D. Manocha, “General-Purpose Computations Using Graphics Processors,” Computer, vol. 38, no. 8, pp. 85-88, Aug. 2005.
[19] T. Smith and M. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[20] D. Feng and R. Doolittle, “Progressive Sequence Alignment as a Prerequisite to a Correct Phylogenetic Trees,” J. Molecular Evolution, vol. 25, pp. 351-360, 1987.
[21] J. Thompson, D. Higgins, and T. Gibson, “ClustalW: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, pp.4673-4680, 1994.
[22] N. Saitou and M. Nei, “The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees,” Molecular Biology and Evolution, vol. 4, pp. 406-425, 1987.
[23] D. Lopresti, “P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences,” Computer, vol. 20, no. 7, pp. 98-99, July 1987.
[24] R. Singh, “BioSCAN: A Network Sharable Computational Resource for Searching Biosequence Databases,” Computer Applications in the Biosciences, vol. 12, no. 3, pp. 191-196, 1996.
[25] E. Chow, T. Hunkapiller, J. Peterson, and M. Waterman, “Biological Information Signal Processor,” Proc. Int'l Conf. Application-Specific Array Processors (ASAP '91), pp. 144-160, 1991.
[26] A. Di Blas, “The UCSC Kestrel Parallel Processor,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 1, pp. 80-92, Jan. 2005.
[27] T. Oliver, B. Schmidt, D. Nathan, R. Clemens, and D. Maskell, “Using Reconfigurable Hardware to Accelerate Multiple Sequence Alignment with ClustalW,” Bioinformatics, vol. 21, pp. 3431-3432, 2005.
[28] T. Purcell, I. Buck, W. Mark, and P. Hanrahan, “Ray Tracing on Programmable Graphics Hardware,” ACM Trans. Graphics, pp.703-712, 2002.
[29] J. England, “A System for Interactive Modeling of Physical Curved Surface Objects,” Proc. ACM SIGGRAPH '78, pp. 336-340, 1978.
[30] M. Potmesil and E. Hoffert, “The Pixel Machine: A Parallel Image Computer,” Proc. ACM SIGGRAPH, pp. 69-78, 1989.
[31] J. Rhoades, G. Turk, A. Bell, A. State, U. Neumann, and A. Varshney, “Real-Time Procedural Textures,” Proc. Symp. Interactive 3D Graphics, pp. 95-100, 1992.
[32] J. Lengyel, M. Reichert, B. Donald, and D. Greenberg, “Real-Time Robot Motion Planning Using Rasterizing Computer Graphics Hardware,” Proc. ACM SIGGRAPH '90, pp. 327-335, 1990.
[33] K. Proudfoot, W. Mark, S. Tzvetkov, and P. Hanrahan, “A Real-Time Procedural Shading System for Programmable Graphics Hardware,” Proc. 28th Ann. Int'l Conf. Computer Graphics and Interactive Techniques (SIGGRAPH '01), pp. 159-170, 2001.
[34] N. Govindaraju, S. Redon, M. Lin, and D. Manocha, “Cullide: Interactive Collision Detection between Complex Models in Large Environments Using Graphics Hardware,” Proc. ACM SIGGRAPH/Eurographics Graphics Hardware Workshop, pp. 25-32, 2003.
[35] K. Hillesland, S. Molinov, and R. Grzeszczuk, “Nonlinear Optimization Framework for Image-Based Modeling on Programmable Graphics Hardware,” Proc. ACM SIGGRAPH '03, pp. 925-934, 2003.
[36] N. Goodnight, C. Woolley, G. Lewin, D. Luebke, and G. Humphreys, “A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware,” Proc. ACM SIGGRAPH/Eurographics Graphics Hardware Workshop, 2003.
[37] M. Harris, G. Coombe, T. Scheuermann, and A. Lastra, “Physically-Based Visual Simulation on Graphics Hardware,” Proc. ACM SIGGRAPH/Eurographics Graphics Hardware Workshop, pp. 109-118, 2002.
[38] M. Harris, W. Baxter, T. Scheuermann, and A. Lastra, “Simulation of Cloud Dynamics on Graphics Hardware,” Proc. ACM SIGGRAPH/Eurographics Graphics Hardware Workshop, pp. 92-101, 2003.
[39] Y. Liu, W. Huang, J. Johnson, and S. Vaidya, “GPU Accelerated Smith-Waterman,” Proc. Int'l Conf. Computational Science (ICCS '06), pp. 188-195, 2006
[40] W. Pearson, “Rapid and Sensitive Sequence Comparison with FASTP and FASTA,” Methods in Enzymology, vol. 183, pp. 63-98, 1990.
[41] W. Pearson, “Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith-Waterman and FASTA Algorithms,” Genomics, vol. 11, pp. 635-650.
[42] K. Li, “ClustalW-MPI: ClustalW Analysis Using Parallel and Distributed Computing,” Bioinformatics, vol. 19, pp. 1585-1586, 2003.
[43] W. Liu and B. Schmidt, “Parallel Pattern-Based Systems for High Performance Computational Biology: A Case Study,” IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 8, pp. 750-763, Aug. 2006.
[44] W. Dally, P. Hanrahan, M. Erez, T. Knight, F. Labonte, J.-H. Ahn, N. Jayasena, U. Kapasi, A. Das, J. Gummaraju, and I. Buck, “Merrimac: Supercomputing with Streams,” Proc. ACM/IEEE Conf. Supercomputing (SC '03), Nov. 2003.

Index Terms:
Streaming architectures, dynamic programming, pairwise sequence alignment, multiple sequence alignment, graphics hardware, GPGPU
Weiguo Liu, Bertil Schmidt, Gerrit Voss, Wolfgang Muller-Wittig, "Streaming Algorithms for Biological Sequence Alignment on GPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, Sept. 2007, doi:10.1109/TPDS.2007.1069
Usage of this product signifies your acceptance of the Terms of Use.