The Community for Technology Leaders
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Haifa, Israel
Sept. 11, 2016 to Sept. 15, 2016
ISBN: 978-1-5090-5308-7
pp: 113-124
Byungchul Hong , KAIST, Korea
Gwangsun Kim , KAIST, Korea
Jung Ho Ahn , Seoul National University, Korea
Yongkee Kwon , SKHynix, Korea
Hongsik Kim , SKHynix, Korea
John Kim , KAIST, Korea
Recent technology advances in memory system design, along with 3D stacking, have made near-data processing (NDP) more feasible to accelerate different workloads. In this work, we explore near-data processing for a fundamental operation - linked-list traversal (LLT). We propose a new NDP architecture that does not change the existing sequential programming model and does not require any modification to the processor microarchitecture. Instead, we exploit the packetized interface between the core and the memory modules to off-load LLT for NDP. We leverage a system with multiple memory modules (e.g., hybrid memory cube (HMC) modules) interconnected with a memory network and our initial evaluation shows that simply off-loading LLT computation to near-memory can actually reduce performance because of the additional off-chip memory network channel traversals. Thus, we first propose NDP-aware data localization to exploit locality - including locality within a single memory module and memory vault - to minimize latency and improve energy efficiency. In order to improve overall throughput and maximize parallelism, we propose batching multiple LLT operations together to amortize the cost of NDP by utilizing the highly parallel execution of NDP processing units and the high bandwidth of 3D stacked DRAM. The combination of NDP-aware data localization and batching can provide significant improvement in performance and energy efficiency compared to host-processing.
Arrays, Acceleration, Three-dimensional displays, Random access memory, Parallel processing, Data processing

B. Hong, G. Kim, J. H. Ahn, Y. Kwon, H. Kim and J. Kim, "Accelerating linked-list traversal through near-data processing," 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Haifa, Israel, 2016, pp. 113-124.
89 ms
(Ver 3.3 (11022016))