Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques (2001)
Sept. 8, 2001 to Sept. 12, 2001
Vijay S. Pai , Rice University
Sarita V. Adve , University of Illinois
Abstract: A recent latency tolerance technique, read miss clustering, restructures code to send demand miss references in parallel to the underlying memory system. An alternate, widely-used latency tolerance technique is software prefetching, which initiates data fetches ahead of expected demand miss references by a certain distance. Since both techniques seem to target the same types of latencies and use the same system resources, it is unclear which technique is superior or if both can be combined. This paper shows that these two techniques are actually mutually beneficial, each helping to overcome limitations of the other. We perform our study for uniprocessor and multiprocessor configurations, in simulation and on a real machine (Convex Exemplar). Compared to prefetching alone (the state-of-the-art implemented in systems today), the combination of the two techniques reduces execution time an average of 21% across all cases studied in simulation and an average of 16% for 5 out of 10 cases on the Exemplar. The combination sees execution time reductions relative to clustering alone averaging 15% for 6 out of 11 cases in simulation and 20% for 6 out of 10 cases on the Exemplar.
V. S. Pai and S. V. Adve, "Comparing and Combining Read Miss Clustering and Software Prefetching," Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques(PACT), Barcelona, Spain, 2001, pp. 0292.