2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2010)
Sept. 11, 2010 to Sept. 15, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/
Ali Bakhoda , University of British Columbia, Department of Electrical and Computer Engineering, Vancouver, Canada
John Kim , KAIST, Department of Computer Science, Daejeon, Korea
Tor M. Aamodt , University of British Columbia, Department of Electrical and Computer Engineering, Vancouver, Canada
There has been little work investigating the overall performance impact of on-chip communication in manycore compute accelerators. In this paper we evaluate performance of a GPU-like compute accelerator running CUDA workloads and consisting of compute nodes, interconnection network and the graphics DRAM memory system using detailed cycle-level simulation. First, we study performance of a baseline architecture employing a scalable mesh network. We then propose several microarchitectural techniques to exploit the communication characteristics of these applications while providing a cost-effective (i.e., low area) on-chip network. Instead of increasing costly bisection bandwidth, we increase the the number of injection ports at the memory controller router nodes to increase terminal bandwidth at the few nodes. In addition, we propose a novel “checkerboard” on-chip network which alternates between conventional, full-routers and half -routers with limited connectivity. This network is enabled by limited communication of the many-to-few traffic pattern. We describe a minimal routing algorithm for the checkerboard network that does not increase the hop count.
A. Bakhoda, J. Kim and T. M. Aamodt, "On-chip network design considerations for compute accelerators," 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, 2010, pp. 535-536.