The Community for Technology Leaders
2015 International Conference on Parallel Architecture and Compilation (PACT) (2015)
San Francisco, CA, USA
Oct. 18, 2015 to Oct. 21, 2015
ISSN: 1089-795X
ISBN: 978-1-4673-9524-3
pp: 113-124
ABSTRACT
The end of Dennard scaling has made all systemsenergy-constrained. For data-intensive applications with limitedtemporal locality, the major energy bottleneck is data movementbetween processor chips and main memory modules. For such workloads, the best way to optimize energy is to place processing near the datain main memory. Advances in 3D integrationprovide an opportunity to implement near-data processing (NDP) withoutthe technology problems that similar efforts had in the past. This paper develops the hardware and software of an NDP architecturefor in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks. We develop simple but scalablehardware support for coherence, communication, and synchronization, anda runtime system that is sufficient to support analytics frameworks withcomplex data patterns while hiding all thedetails of the NDP hardware. Our NDP architecture provides up to 16x performance and energy advantageover conventional approaches, and 2.5x over recently-proposed NDP systems. We also investigate the balance between processing and memory throughput, as well as the scalability and physical and logical organization of the memory system. Finally, we show that it is critical to optimize software frameworksfor spatial locality as it leads to 2.9x efficiency improvements for NDP.
INDEX TERMS
Hardware, Computer architecture, Three-dimensional displays, Instruction sets, Random access memory, Runtime,In-memory analytics, Near-data processing, Processing in memory, Energy efficiency
CITATION
Mingyu Gao, Grant Ayers, Christos Kozyrakis, "Practical Near-Data Processing for In-Memory Analytics Frameworks", 2015 International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 113-124, 2015, doi:10.1109/PACT.2015.22
89 ms
(Ver 3.3 (11022016))