Parallel Architectures, Algorithms and Programming, International Symposium on (2010)
Dalian, Liaoning China
Dec. 18, 2010 to Dec. 20, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2010.22
Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPU-GPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case study. We attempt to optimize the performance and energy consumption of the application utilizing different memory access mechanisms and present optimization methods including matrix transposition and kernel fusion. The experimental results on the Intel Core TM i920 CPU plus GeForce GTX 295 platform shows that, the proposed optimizing methods achieve a speedup of 8.7X over the original OpenMP program and reduce the energy consumption by 83% for the problem size of 2048*2048.
GPGPU, Optimization, Energy consumption, SWIM
X. Fang, Y. Tang, G. Wang and W. Yi, "A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU," Parallel Architectures, Algorithms and Programming, International Symposium on(PAAP), Dalian, Liaoning China, 2010, pp. 123-129.