19th International Conference of the Chilean Computer Science Society
Sorting on the SGI Origin 2000: Comparing MPI and Shared Memory Implementations
Talca, Chile
November 11-November 13
ISBN: 0-7695-0296-2
In this paper we analyze the Communication and Cache Conscious Radix sort algorithm, C3-Radix, using the distributed and the shared memory parallel programming models.C3-Radix was originally proposed based on the idea of the classic Radix sort to exploit the memory hierarchy locality and reduce the amount of communication for distributed memory computers. Here, we implement C3-Radix on the SGI Origin 2000 NUMA multiprocessor and make use of the Message Passing Interface (MPI) and the native shared memory directives of that computer to implement the two programming models that we want to analyze.We give results for up to 16 processors and 64M 32bit keys. The results show that for data sets that are small compared to the number of processors, the MPI implementation is faster while for data sets that are large, the shared memory implementation is faster. In the paper, we explain the reasons for the different behaviors depending on the size of the data sets.
Index Terms:
Sorting, Radix sort, MPI, Shared Memory, SGI Origin 2000
Citation:
D. Jimenez-Gonzalez, E. Guinovart, J.-L. Larriba-Pey, J.J. Navarro, "Sorting on the SGI Origin 2000: Comparing MPI and Shared Memory Implementations," sccc, pp.209, 19th International Conference of the Chilean Computer Science Society, 1999