
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Andrea C. Dusseau, David E. Culler, Klaus Erik Schauser, Richard P. Martin, "Fast Parallel Sorting Under LogP: Experience with the CM5," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 791805, August, 1996.  
BibTex  x  
@article{ 10.1109/71.532111, author = {Andrea C. Dusseau and David E. Culler and Klaus Erik Schauser and Richard P. Martin}, title = {Fast Parallel Sorting Under LogP: Experience with the CM5}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {8}, issn = {10459219}, year = {1996}, pages = {791805}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.532111}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Fast Parallel Sorting Under LogP: Experience with the CM5 IS  8 SN  10459219 SP791 EP805 EPD  791805 A1  Andrea C. Dusseau, A1  David E. Culler, A1  Klaus Erik Schauser, A1  Richard P. Martin, PY  1996 KW  Massively parallel processing KW  models of parallel computation KW  sorting KW  algorithm implementation KW  communication performance KW  communication schedules KW  performance prediction KW  performance analysis KW  data layout. VL  7 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (
We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms.
[1] S. Fortune and J. Wyllie, "Parallelism in Random Access Machines," Proc. 10th Ann Symp. Theory of Computing, pp. 114118, 1978.
[2] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[3] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick, "Parallel Programming in SplitC," Supercomputing, 1993.
[4] G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM2," Proc. Third Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 316, July 1991.
[5] K. Batcher, "Sorting Networks and their Applications" Proc. AFIPS Spring Joint Computing Conf., 1986.
[6] T. Leighton, "Tight Bounds on the Complexity of Parallel Sorting," IEEE Trans. Computers, vol. 34, no. 4, pp. 344354, Apr. 1985.
[7] M. Zagha and G. Blelloch, "Radix Sort for Vector Multiprocessors," Supercomputing, 1991.
[8] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 6379, Mar. 1992.
[9] R.M. Karp, M. Luby, and F. Meyer auf der Heide, Efficient PRAM Simulation on a Distributed Memory Machine Proc. 24th ACM Symp. Theory of Computing, pp. 318326, May 1992.
[10] K. Mehlhorn and U. Vishkin, "Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories," Acta Informatica, vol. 21, pp. 339374, 1984.
[11] R. Cole and O. Zajicek, "The APRAM: Incorporating Asynchrony into the PRAM Model," Proc. Symp. Parallel Architectures and Algorithms, pp. 169178, 1989.
[12] P. Kanellakis and A. Shvartsman, "Efficient Parallel Algorithms Can Be Made Robust," Proc. Eighth Symp. Principles of Distributed Computing, pp. 211221, 1989.
[13] Z.M. Kedem, K.V. Palem, and P.G. Spirakis, "Efficient Robust Parallel Computations," Proc. 22nd Ann. Symp. Theory of Computing, pp. 138148, 1990.
[14] P.B. Gibbons, "A More Practical PRAM Model," Proc ACM Symp. Parallel Algorithms and Architectures, pp. 158168, ACM, 1989.
[15] C.H. Papadimitriou and M. Yannakakis, "Towards an ArchitectureIndependent Analysis of Parallel Algorithms," Proc. 20th Ann. ACM Symp. Theory of Computing, pp. 510513, ACM, 1988.
[16] A. BarNoy and S. Kipnis, "Designing Broadcasting Algorithms in the Postal Model for MessagePassing Systems," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1122, June 1992.
[17] A. Aggarwal, A.K. Chandra, and M. Snir, "On Communication Latency in PRAM Computation," Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 1121, ACM, June 1989.
[18] A. Aggarwal, A.K. Chandra, and M. Snir, "Comm. Complexity of PRAMs," Theoretical Computer Science, pp. 328, Mar. 1990.
[19] C.U. Martel and A. Raghunathan, "Asynchronous PRAMs with Memory Latency," Technical Report, Univ. of California, Davis, 1991.
[20] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103111, Aug. 1990.
[21] T. von Eicken et al., "Active Messages: a Mechanism for Integrated Communications and Computation," Computer Architecture News, Vol. 20, No. 2, May 1992, pp. 256266.
[22] C. Shannon and W. Weaver, The Mathematical Theory of Comm., Univ. of Illinois Press: Urbana, 1949.
[23] K. Thearling and S. Smith, "An Improved Supercomputer Sorting Benchmark," Technical Report, Thinking Machines Corp., 1991.
[24] P. Liu, W. Aiello, and S. Bhatt, "An Atomic Model for MessagePassing," Proc. ACM Symp. Parallel Algorithms and Architectures, 1993.
[25] R. Karp,A. Sahay,E. Santos,, and K.E. Schauser,“Optimal broadcast and summation in the LogP model,” Proc. Fifth Ann. Symp. Parallel Algorithms and Architectures, ACM, June 1993.
[26] J.H. Reif and L.G. Valiant, "A Logarithmic Time Sort for Linear Size Networks," J. ACM, vol. 34, pp. 6076, Jan. 1987.