Issue No. 02 - February (1999 vol. 48)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.752663
<p><b>Abstract</b>—Current microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latency-hiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are less effective in removing memory stall time. Consequently, despite the inherent latency tolerance features of ILP processors, we find memory system performance to be a larger bottleneck and parallel efficiencies to be generally poorer in ILP-based multiprocessors than in previous generation multiprocessors. The main reasons for these deficiencies are insufficient opportunities in the applications to overlap multiple load misses and increased contention for resources in the system. We also find that software prefetching does not change the memory bound nature of most of our applications on our ILP multiprocessor, mainly due to a large number of late prefetches and resource contention. Our results suggest the need for additional latency hiding or reducing techniques for ILP systems, such as software clustering of load misses and producer-initiated communication. </p>
Shared-memory multiprocessors, instruction-level parallelism, software prefetching, performance evaluation.
S. Adve, H. Abdel-Shafi, V. S. Pai and P. Ranganathan, "The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors," in IEEE Transactions on Computers, vol. 48, no. , pp. 218-226, 1999.