The Community for Technology Leaders
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
ISBN: 978-1-5090-6609-4
pp: 75-85
Ronald G. Dreslinski , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Thomas Manville , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Korey Sewell , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Reetuparna Das , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Nathaniel Pinckney , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Sudhir Satpathy , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
David Blaauw , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Dennis Sylvester , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
Trevor Mudge , Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, 48109, USA
ABSTRACT
With multi-core processors now mainstream, the shift to many-core processors poses a new set of design challenges. In particular, the scalability of coherence protocols remains a significant challenge. While complex Network-on-Chip interconnect fabrics have been proposed and in some cases implemented, most of industry has slowly evolved existing coherence solutions to meet the needs of a growing number of cores. Industries' slow adoption of Network-on-Chip designs is in large part due to the significant effort needed to design and verify the system. However, simply scaling bus-based coherence is not straightforward either because of increased contention and latency on the bus for large core counts. This paper proposes a new architecture, XPoint, which does not need to modify existing bus-based snooping coherence protocols to scale to 64 core systems. XPoint employs interleaved cache structures with detailed floorplaning and system analysis to reduce contention at high core counts. Results show that the XPoint system achieves, on average, a 28× and 35× speedup over a single core design on the Splash2 benchmarks for a 32 and 64 core system respectively (a 1.6× improvement over a 64 core conventional bus). XPoint is also evaluated as a 3D stacked system to reduce further bus latency. Results show a 29× and 45× speedup for 32 and 64 core systems respectively (a 2.1× improvement over a 64 core conventional bus and within 8% of the speedup of a 64 core system with an ideal interconnect). Measurements also show that the XPoint system decreases bus contention of a 64 core system to only 13% higher than that of an 8-core design (a 29× improvement over a 64 core conventional bus).
INDEX TERMS
Protocols, Coherence, Two dimensional displays, Three-dimensional displays, Benchmark testing, Delays, Industries
CITATION
Ronald G. Dreslinski, Thomas Manville, Korey Sewell, Reetuparna Das, Nathaniel Pinckney, Sudhir Satpathy, David Blaauw, Dennis Sylvester, Trevor Mudge, "XPoint cache: Scaling existing bus-based coherence protocols for 2D and 3D many-core systems", 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 75-85, 2012, doi:
160 ms
(Ver 3.3 (11022016))