18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 14
Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
The gap between peak and delivered performance for scientific applications running on microprocessor-based systems has grown considerably in recent years. The inability to achieve the desired performance even on a single processor is often attributed to an inadequate memory system, but without identification or quantification of a specific bottleneck. In this work, we use an adaptable synthetic benchmark to isolate application characteristics that cause a significant drop in performance, giving application programmers and architects information about possible optimizations. Our adaptable probe, called sqmat, uses only four parameters to capture key characteristics of scientific workloads: working-set size, computational intensity, indirection, and irregularity. This paper describes the implementation of sqmat and uses its tunable parameters to evaluate four leading 64-bit microprocessors that are popular building blocks for current high performance systems: Intel Itanium2, AMD Opteron, IBM Power3, and IBM Power4.
Citation:
Gorden Griem, Leonid Oliker, John Shalf, Katherine Yelick, "Identifying Performance Bottlenecks on Modern Microarchitectures Using an Adaptable Probe," ipdps, vol. 15, pp.255a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 14, 2004