The Community for Technology Leaders
2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) (1996)
San Jose, CA
Feb. 3, 1996 to Feb. 7, 1996
ISBN: 0-8186-7237-4
pp: 74
J.P. Singh , Comput. Syst. Lab., Stanford Univ., CA, USA
K. Olukotun , Comput. Syst. Lab., Stanford Univ., CA, USA
B.A. Nayfeh , Comput. Syst. Lab., Stanford Univ., CA, USA
ABSTRACT
As processor performance continues to increase, greater demands are placed on the bus and memory systems of small-scale shared-memory multiprocessors. In this paper, we investigate how to reduce these demands by organizing groups of processors into clusters which are then connected together using a shared global bus. We take advantage of the high-bandwidth, low-latency interconnections available from multichip module (MCM) technology, to build clusters with multiple high-performance processors sharing an L2 cache. The use of MCM technology allows for significantly lower shared-cache access times, and higher shared cache to processor bandwidth, than is possible using printed circuit board (PCB) designs. Our results show that for an eight processor bus-based system, bus contention can be a large portion of the overall execution time, and that clustering can eliminate much or all of it. Clustering also tends to reduce read stall times due to shared working set effects and a reduction in the effect of communication misses. The same is true for two and four processor systems, although to a lesser extent. Overall, we find that clustering can result in significant performance gains for applications which heavily utilize the memory system.
INDEX TERMS
shared memory systems; cache storage; performance evaluation; shared-cache clustering; small-scale shared-memory multiprocessors; processor performance; shared global bus; high-bandwidth; low-latency interconnections; multichip module; L2 cache; bus contention; memory system
CITATION
J.P. Singh, K. Olukotun, B.A. Nayfeh, "The impact of shared-cache clustering in small-scale shared-memory multiprocessors", 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), vol. 00, no. , pp. 74, 1996, doi:10.1109/HPCA.1996.501175
91 ms
(Ver 3.3 (11022016))