This Article 
 Bibliographic References 
 Add to: 
CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware
November/December 2011 (vol. 8 no. 6)
pp. 1678-1684
Weiguo Liu, Nanyang Technological University, Singapore
Bertil Schmidt, Nanyang Technological University, Singapore
Wolfgang Müller-Wittig, Nanyang Technological University, Singapore
Scanning protein sequence database is an often repeated task in computational biology and bioinformatics. However, scanning large protein databases, such as GenBank, with popular tools such as BLASTP requires long runtimes on sequential architectures. Due to the continuing rapid growth of sequence databases, there is a high demand to accelerate this task. In this paper, we demonstrate how GPUs, powered by the Compute Unified Device Architecture (CUDA), can be used as an efficient computational platform to accelerate the BLASTP algorithm. In order to exploit the GPU's capabilities for accelerating BLASTP, we have used a compressed deterministic finite state automaton for hit detection as well as a hybrid parallelization scheme. Our implementation achieves speedups up to 10.0 on an NVIDIA GeForce GTX 295 GPU compared to the sequential NCBI BLASTP 2.2.22. CUDA-BLASTP source code which is available at

[1] T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[2] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[3] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.
[4] W.J. Kent, “BLAT—The BLAST-Like Alignment Tool,” Genome Research, vol. 12, pp. 656-664, 2002.
[5] M. Li, B. Ma, D. Kisman, and J. Tromp, “Patternhunter II: Highly Sensitive and Fast Homology Search,” J. Bioinformatics and Computational Biology, vol. 2, pp. 417-439, 2004.
[6] H. Rangwala, E. Lantz, R. Musselman, K. Pinnow, B. Smith, and B. Wallenfelt, “Massively Parallel BLAST for the Blue Gene/L,” Proc. High Availability and Performance Computing Workshop, 2005.
[7] J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable Parallel Programming with CUDA,” Queue, vol. 6, no. 2, pp. 40-53, 2008.
[8] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” IEEE Micro, vol. 28, no. 2, pp. 39-55, Apr./Mar. 2008.
[9] M. Schatz, C. Trapnell, A. Delcher, and A. Varshney, “High-Throughput Sequence Alignment Using Graphics Processing Units,” BMC Bioinformatics, vol. 8, article no. 474, 2007.
[10] C. Trapnell and M. Schatz, “Optimizing Data Intensive GPGPU Computations for DNA Sequence Alignment,” Parallel Computing, vol. 35, nos. 8/9, pp. 429-440 , 2009.
[11] Y. Liu, D. Maskell, and B. Schmidt, “CUDASW++: Optimizing Smith-Waterman Sequence Database Searches for CUDA-Enabled Graphics Processing Units,” BMC Research Notes, vol. 2, article no. 73, 2009.
[12] S. Manavski and G. Valle, “CUDA Compatible GPU Cards as Efficient Hardware Accelerators for Smith-Waterman Sequence Alignment,” BMC Bioinformatics, vol. 9, Supp. 2, article no. S10, 2008.
[13] J.P. Walters, V. Balu, S. Kompalli, and V. Chaudhary, “Evaluating the Use of GPUs for Life Science Applications,” IPDPS '09: Proc. IEEE 23rd Int'l Parallel and Distributed Processing Symp., 2009.
[14] S.F. Altschul, T.L. Madden, R.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[15] H. Zhang, B. Schmidt, and W. Müller-Wittig, “Accelerating BLASTP on the Cell Broadband Engine,” Proc. Third IAPR Int'l Conf. Pattern Recognition in Bioinformatics, pp. 460-470, 2008.
[16] A.E. Darling, L. Carey, and W. Feng, “The Design, Implementation, and Evaluation of mpiBLAST,” Proc. ClusterWorld, 2003.
[17] H. Lin, P. Balaji, R. Poole, C. Sosa, and W. Feng, “Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture,” Proc. 2008 ACM/IEEE Conf. Supercomputing, 2008.
[18] C. Oehmen and J. Nieplocha, “ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis,” IEEE Trans. Parallel Distributed Systems, vol. 17, no. 8, pp. 740-749, Aug. 2006.
[19] A. Jacob, J. Lancaster, J. Buhler, B. Harris, and R.D. Chamberlain, “Mercury BLASTP: Accelerating Protein Sequence Alignment,” ACM Trans. Reconfigurable Technoly Systems, vol. 1, no. 2, pp. 1-44, 2008.
[20] K. Muriki, K.D. Underwood, and R. Sass, “RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation,” IPDPS '05: Proc. IEEE 19th Int'l Parallel and Distributed Processing Symp., p. 8, 2005.
[21] M.C. Herbordt, J. Model, B. Sukhwani, Y. Gu, and T. VanCourt, “Single Pass Streaming BLAST on FPGAs,” Parallel Computing, vol. 33, nos. 10/11, pp. 741-756, 2007.
[22] V.H. Nguyen and D. Lavenier, “Speeding Up Subset Seed Algorithm for Intensive Protein Sequence Comparison,” Proc. IEEE Sixth Int'l Conf. Research, Innovation & Vision for the Future, pp. 57-63. 2008,
[23] C. Michael, E.W. Hugh, and C. Adam, “A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST,” J. Computational Biology, vol. 13, pp. 965-978, 2005.
[24] W. Pearson, “Rapid and Sensitive Sequence Comparison with FASTP and FASTA,” Methods Enzymol, vol. 183, pp. 63-98, 1990.
[25] M. Farrar, “Striped Smith-Waterman Speeds Database Searches Six Times over Other SIMD Implementations,” Bioinformatics, vol. 23, no. 2, pp. 156-161, 2007.
[26] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T. Madden, “BLAST+: Architecture and Applications,” BMC Bioinformatics, vol. 10, article no. 421, 2009.
[27] S.F. Altschul, J.C. Wootton, E.M. Gertz, R. Agarwala, A. Morgulis, A.A. Schäffer, and Y.-K. Yu, “Protein Database Searches Using Compositionally Adjusted Substitution Matrices,” FEBS J., vol. 272, no. 20, pp. 5101-5109, 2005.
[28] M. Gribskov and N.L. Robinson, “Use of Receiver Operator Characteristic (ROC) Analysis to Evaluate Sequence Matching,” Computers & Chemistry, vol. 20, no. 1, pp. 25-33, 1996.
[29] S.E. Brenner, P. Koehl, and M. Levitt, “The ASTRAL Compendium for Protein Structure and Sequence Analysis,” Nucleic Acids Research, vol. 28, no. 1, pp. 254-256, 2000.

Index Terms:
BLAST, dynamic programming, sequence alignment, graphics hardware, GPGPU, CUDA.
Weiguo Liu, Bertil Schmidt, Wolfgang Müller-Wittig, "CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1678-1684, Nov.-Dec. 2011, doi:10.1109/TCBB.2011.33
Usage of this product signifies your acceptance of the Terms of Use.