|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 17th International Conference on Parallel and Distributed Systems
Automatic FFT Performance Tuning on OpenCL GPUs
Tainan, Taiwan
December 07-December 09
ISBN: 978-0-7695-4576-9
| ASCII Text | x | ||
| Yan Li, Yunquan Zhang, Haipeng Jia, Guoping Long, Ke Wang, "Automatic FFT Performance Tuning on OpenCL GPUs," Parallel and Distributed Systems, International Conference on, pp. 228-235, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICPADS.2011.32, author = {Yan Li and Yunquan Zhang and Haipeng Jia and Guoping Long and Ke Wang}, title = {Automatic FFT Performance Tuning on OpenCL GPUs}, journal ={Parallel and Distributed Systems, International Conference on}, volume = {0}, year = {2011}, issn = {1521-9097}, pages = {228-235}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICPADS.2011.32}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Parallel and Distributed Systems, International Conference on TI - Automatic FFT Performance Tuning on OpenCL GPUs SN - 1521-9097 SP228 EP235 A1 - Yan Li, A1 - Yunquan Zhang, A1 - Haipeng Jia, A1 - Guoping Long, A1 - Ke Wang, PY - 2011 KW - FFT KW - DFT KW - GPU KW - OpenCL KW - Auto-tuning VL - 0 JA - Parallel and Distributed Systems, International Conference on ER - | |||
Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs.
Index Terms:
FFT, DFT, GPU, OpenCL, Auto-tuning
Citation:
Yan Li, Yunquan Zhang, Haipeng Jia, Guoping Long, Ke Wang, "Automatic FFT Performance Tuning on OpenCL GPUs," icpads, pp.228-235, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011
Usage of this product signifies your acceptance of the Terms of Use.
