International Parallel and Distributed Processing Symposium (IPDPS'03)
Accessing Hardware Performance Counters in order to Measure the Influence of Cache on the Performance of Integer Sorting
Nice, France
April 22-April 26
ISBN: 0-7695-1926-1
Hardware performance counters are used to discover the impact of L1 data cache misses on the overall performance of six integer sorting algorithms. Most of them are cache conscious algorithms recently introduced, or known to behave well according to previous simulations, or they are totally not explored. We demonstrate through experiments on an Athlon processor that a good balance between L1 data cache misses and retired instructions provides the fastest algorithm for sorting in practical cases. The fastest sorting algorithm is not obtained with the implementation that gives the smallest number of misses and the smallest number of instructions. The fastest algorithm in practice is a new flavour of mergesort that we have developed.
Citation:
Christophe Cérin, Hazem Fkaier, Mohamed Jemni, "Accessing Hardware Performance Counters in order to Measure the Influence of Cache on the Performance of Integer Sorting," ipdps, pp.274a, International Parallel and Distributed Processing Symposium (IPDPS'03), 2003