The Community for Technology Leaders
IEEE International Performance Computing and Communications Conference (2011)
Orlando, FL, USA
Nov. 17, 2011 to Nov. 19, 2011
ISBN: 978-1-4673-0010-0
pp: 1-2
Saddam Quirem , Department of Electrical and Computer Engineering The University of Texas at San Antonio, San Antonio, Texas, USA
Fahian Ahmed , Department of Electrical and Computer Engineering The University of Texas at San Antonio, San Antonio, Texas, USA
Byeong Kil Lee , Department of Electrical and Computer Engineering The University of Texas at San Antonio, San Antonio, Texas, USA
ABSTRACT
Dynamic programming matrices and the P7Viterbi algorithm of HMMER 3.0 show high parallelism in its code. Within the code, every query can have its score calculated in parallel with one thread per query. In this paper, these parallel features were exploited through the use of CUDA and a GPGPU. The CUDA implementation of this algorithm being performed on the Tesla C1060 enabled a 10 -- 15x speedup depending on the number of queries. Without concurrent kernel execution and memory transfers a speedup of over 4x was achieved in terms of the total execution time. With a wide range of data sizes where the CPU has greater performance, it would be important that CUDA enabled programs properly select when to and not utilize the GPU for acceleration.
INDEX TERMS
CITATION
Saddam Quirem, Fahian Ahmed, Byeong Kil Lee, "CUDA acceleration of P7Viterbi algorithm in HMMER 3.0", IEEE International Performance Computing and Communications Conference, vol. 00, no. , pp. 1-2, 2011, doi:10.1109/PCCC.2011.6108104
91 ms
(Ver 3.3 (11022016))