The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 83-92
Jose-Maria Arnau , Comput. Archit. Dept., Univ. Politec. de Catalunya, Barcelona, Spain
Joan-Manuel Parcerisa , Comput. Archit. Dept., Univ. Politec. de Catalunya, Barcelona, Spain
Polychronis Xekalakis , Intel Labs., Intel Corp., Hillsboro, CA, USA
ABSTRACT
Perhaps one of the most important design aspects for smartphones and tablets is improving their energy efficiency. Unfortunately, rich media content applications typically put significant pressure to the GPU's memory subsystem. In this paper we propose a novel means of dramatically improving the energy efficiency of these devices, for this popular type of applications. The main hurdle in doing so is that GPUs require a significant amount of memory bandwidth in order to fetch all the necessary textures from memory. Although consecutive frames tend to operate on the same textures, their re-use distances are so big that to the caches fetching textures appears to be a streaming operation. Traditional designs improve the degree of multi-threading and the memory bandwidth, as a means of improving performance. In order to meet the energy efficiency standards required by the mobile market, we need a different approach. We thus propose a technique which we term Parallel Frame Rendering (PFR). Under PFR, we split the GPU into two clusters where two consecutive frames are rendered in parallel. PFR exploits the high degree of similarity between consecutive frames to save memory bandwidth by improving texture locality. Since the physics part of the rendering has to be computed sequentially for two consecutive frames, this naturally leads to an increase in the input delay latency for PFR compared with traditional systems. However we argue that this is rarely an issue, as the user interface in these devices is much slower than those of desktop systems. Moreover, we show that we can design reactive forms of PFR that allow us to bound the lag observed by the end user, thus maintaining the highest user experience when necessary. Overall we show that PFR can achieve 28% of memory bandwidth savings with only minimal loss in system responsiveness.
INDEX TERMS
Rendering (computer graphics), Graphics processing units, Tiles, Bandwidth, Mobile communication, Switches, Memory management,thread-level parallelism, GPGPUs, scheduling
CITATION
Jose-Maria Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis, , "Neither more nor less: optimizing thread-level parallelism for GPGPUs", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 83-92, 2013, doi:10.1109/PACT.2013.6618806
442 ms
(Ver 3.3 (11022016))