Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2011)
Galveston, Texas USA
Oct. 10, 2011 to Oct. 14, 2011
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2011.48
Within the recent years, accelerators such as GPGPU have been widely adapted by industry and academia. Many research on kernel computations are reported over 100 times speedup using GPGPU. For real applications on industry, however, the data communication between CPUs and GPUs often dramatically slow down the overall performance. We desgined and implemented a runtime prefetching scheme leveraging the array region information provided by the compiler. We have evaluated our prefetching system using the NPB SP benchmark. SP requires frequent data communication among CPUs and GPUs when using two GPUs. SP achieves 1.25 times speedup on a 4-core Intel Xeon Linux system with one Nvidia GTX 285 and a Tesla C1060 with the prefetching scheme.
Xionghui Hou, Li Chen, Baojiang Shou, "A Compiler-assisted Runtime-prefetching Scheme for Heterogenous Platforms", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 215, 2011, doi:10.1109/PACT.2011.48