The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2000)
Philadelphia, Pennsylvania
Oct. 15, 2000 to Oct. 19, 2000
ISSN: 1089-795X
ISBN: 0-7695-0622-4
pp: 81
Kevin Scott , University of Virginia
Jack Davidson , University of Virginia
ABSTRACT
Multimedia instruction set extensions has become a prominent feature in desktop microprocessor platforms, promising superior performance on a wide range of floating-point and integer signal processing, multimedia, and scientific applications. Nevertheless, the question remains whether or not these multimedia extensions can be applied to improve the performance of general, integer intensive applications. The answer to this question is important and could be used to direct research and development of compiler algorithms and refinements to multimedia architectures. In this paper, we answer the question of whether integer programs exhibit enough sub-word level parallelism (SLP) to facilitate performance improvements through use of multimedia extensions. Using a highly optimizing compiler and a simulator for an aggressive SLP architecture, we measured available SLP in a range of integer benchmarks. Our measurements show that these applications exhibit significant levels of SLP. Using the most aggressive simulator settings, dynamic instruction count reductions of 17 to 36 percent were observed. However, detailed examination of the data indicates that much of this parallelism is equivalent to instruction-level parallelism (ILP) and could just as easily be exploited by a traditional ILP architecture. Our findings indicate that researchers should focus their efforts on exploiting SLP in floating-point intensive and multimedia applications.
INDEX TERMS
CITATION
Kevin Scott, Jack Davidson, "Exploring the Limits of Sub-Word Level Parallelism", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 81, 2000, doi:10.1109/PACT.2000.888333
82 ms
(Ver 3.3 (11022016))