The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 73-82
Ankit Sethia , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
Ganesh Dasika , ARM R&D, Austin, TX, USA
Mehrzad Samadi , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
Scott Mahlke , Adv. Comput. Archit. Lab., Univ. of Michigan - Ann Arbor, Ann Arbor, MI, USA
ABSTRACT
Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means to increase the energy efficiency of GPUs. Classically, CPU prefetching results in higher performance but worse energy efficiency due to unnecessary data being brought on chip. Our approach, called APOGEE, uses an adaptive mechanism to dynamically detect and adapt to the memory access patterns found in both graphics and scientific applications that are run on modern GPUs to achieve prefetching efficiencies of over 90%. Rather than examining threads in isolation, APOGEE uses adjacent threads to more efficiently identify address patterns and dynamically adapt the timeliness of prefetching. The net effect of APOGEE is that fewer thread contexts are necessary to hide memory latency and thus sustain performance. This reduction in thread contexts and related hardware translates to simplification of hardware and leads to a reduction in power. For Graphics and GPGPU applications, APOGEE enables an 8X reduction in multi-threading hardware, while providing a performance benefit of 19%. This translates to a 52% increase in performance per watt over systems with high multi-threading and 33% over existing GPU prefetching techniques.
INDEX TERMS
Prefetching, Graphics processing units,processor interconnect, hybrid memory cube, memory bandwidth wall, memory network
CITATION
Ankit Sethia, Ganesh Dasika, Mehrzad Samadi, Scott Mahlke, "Memory-centric system interconnect design with hybrid memory cubes", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 73-82, 2013, doi:10.1109/PACT.2013.6618805
1597 ms
(Ver 3.3 (11022016))