The Community for Technology Leaders
2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2010)
Vienna, Austria
Sept. 11, 2010 to Sept. 15, 2010
ISBN: 978-1-5090-5032-1
pp: 193-204
Jaejin Lee , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Jungwon Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Sangmin Seo , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Seungkyun Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Jungho Park , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Honggyu Kim , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Thanh Tuan Dao , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Yongjin Cho , School of Computer Science and Engineering, Seoul National University, 151-744, Korea
Sung Jong Seo , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Seung Hak Lee , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Seung Mo Cho , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Hyo Jung Song , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Sang-Bum Suh , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
Jong-Deok Choi , Samsung Electronics Co., Nongseo-dong, Giheung-gu, Yongin-si, Geonggi-do 446-712, Korea
ABSTRACT
In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our OpenCL runtime is based on software-managed caches and coherence protocols that guarantee OpenCL memory consistency to overcome the limited size of the local memory. To boost performance, the runtime relies on three source-code transformation techniques, work-item coalescing, web-based variable expansion and preload-poststore buffering, performed by our OpenCL C source-to-source translator. Work-item coalescing is a procedure to serialize multiple SPMD-like tasks that execute concurrently in the presence of barriers and to sequentially run them on a single accelerator core. It requires the web-based variable expansion technique to allocate local memory for private variables. Preload-poststore buffering is a buffering technique that eliminates the overhead of software cache accesses. Together with work-item coalescing, it has a synergistic effect on boosting performance. We show the effectiveness of our OpenCL framework, evaluating its performance with a system that consists of two Cell BE processors. The experimental result shows that our approach is promising.
INDEX TERMS
Preload-poststore buffering, OpenCL, Compilers, Runtime, Software-managed caches, Memory consistency, Work-item coalescing
CITATION
Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jungho Park, Honggyu Kim, Thanh Tuan Dao, Yongjin Cho, Sung Jong Seo, Seung Hak Lee, Seung Mo Cho, Hyo Jung Song, Sang-Bum Suh, Jong-Deok Choi, "An OpenCL framework for heterogeneous multicores with local memory", 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 193-204, 2010, doi:
95 ms
(Ver 3.3 (11022016))