The Community for Technology Leaders
2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (2000)
Toulouse, France
Jan. 8, 2000 to Jan. 12, 2000
ISBN: 0-7695-0550-3
pp: 145
Douglas Joseph , IBM Research
Anthony-Trung Nguyen , University of Illinois and Urbana-Champaign
Ashwini Nanda , IBM Research
Maged Michael , IBM Research
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. In this paper we study three approaches to alleviating this problem in hardwired coherence controllers, namely, multiple protocol engines, pipelined protocol engines, and split request-response streams.Split request-response streams is an innovative contribution of this paper. The performance of pipelining in the context of coherence controllers has not been presented in the literature. Multiple protocol engines has not been studied in the context of hardwired controllers except for a study of ours and only to a limited extent.Using both commercial and scientific benchmarks on detailed simulation models, we present experimental results that show that each mechanism is highly effective at reducing controller occupancy by as much as 66% and improving execution time by as much as 51%, for applications with high communication bandwidth requirement. A combination of mechanisms further reduces controller occupancy and execution time by as much as 78% and 61%, respectively.Our results show that applying any of the parallel mechanisms in the coherence controllers allows integrating four times as many processors per coherence controller, thus reducing system cost, while maintaining or even exceeding the performance of systems with larger number of coherence controllers.
DSM, NUMA, Coherence Controllers, Protocol Engines, Pipeline, Microarchitecture
Douglas Joseph, Anthony-Trung Nguyen, Ashwini Nanda, Maged Michael, "High-Throughput Coherence Controllers", 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), vol. 00, no. , pp. 145, 2000, doi:10.1109/HPCA.2000.824346
98 ms
(Ver 3.3 (11022016))