
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
N.S. Sundar, D.N. Jayasimha, D.K. Panda, P. Sadayappan, "Hybrid Algorithms for Complete Exchange in 2D Meshes," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 12, pp. 12011218, December, 2001.  
BibTex  x  
@article{ 10.1109/71.970553, author = {N.S. Sundar and D.N. Jayasimha and D.K. Panda and P. Sadayappan}, title = {Hybrid Algorithms for Complete Exchange in 2D Meshes}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {12}, number = {12}, issn = {10459219}, year = {2001}, pages = {12011218}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.970553}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Hybrid Algorithms for Complete Exchange in 2D Meshes IS  12 SN  10459219 SP1201 EP1218 EPD  12011218 A1  N.S. Sundar, A1  D.N. Jayasimha, A1  D.K. Panda, A1  P. Sadayappan, PY  2001 KW  Collective communication KW  complete exchange KW  combining KW  direct exchange KW  hybrid algorithms KW  message contention KW  barrier synchronization KW  mesh topology KW  wormhole routing VL  12 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Parallel algorithms for several common problems such as sorting and the FFT involve a personalized exchange of data among all the processors. Past approaches to doing complete exchange have taken one of two broad approaches: direct exchange or the indirect messagecombining approaches. While combining approaches reduce the number of message startups, direct exchange minimizes the volume of data transmitted. This paper presents a family of hybrid algorithms for wormholerouted 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models, simulation, and implementation on a Cray T3D system. The results show that hybrids achieve lower completion times than either pure algorithm for a range of mesh sizes, data block sizes, and message startup costs. It is also demonstrated that barriers may be used to enhance performance by reducing message contention, whether or not the target system provides hardware support for barrier synchronization. The analytical models are shown useful in selecting the optimum hybrid for any given combination of system parameters (mesh size, message startup time, flit transfer time, and barrier cost) and the problem parameter (data block size).
[1] B. Abali, F. Özgüner, and A. Bataineh, “Balanced Parallel Sort on Hypercube Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 5, pp. 572581, May 1993.
[2] D.A. Bader, D.R. Helman, and J. JáJá, “Practical Parallel Algorithms for Personalized Communication and Integer Sorting,” Technical Report UMIACS TR 95101, Inst. for Advanced Computer Studies, Univ. of Maryland, 1995. .
[3] V. Bala,J. Bruck,R. Cypher,P. Elustondo,A. Ho,C.T. Ho,S. Kipnis,, and M. Snir,“CCL: A portable and tunable collective communication library forscalable parallel computers,” Eighth Int’l Parallel Processing Symp., IEEE, pp. 835844, Apr. 1994.
[4] M. Barnett, S. Gupta, D.G. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Building a HighPerformance Collective Communication Library,” Proc. Scalable High Performance Computing Conf., pp. 835844, 1994.
[5] D. Bertsekas, C. Ozveren, G. Stamoulis, P. Tseng, and J. Tsitsiklis, "Optimal Communication Algorithms for Hypercubes," J. Parallel and Distributed Computing, vol. 11, pp. 263275, 1991.
[6] S.H. Bokhari, “Complete Exchange on the iPSC860,” Technical Report 914, Inst. for Computer Applications in Science and Eng., NASA Langley Research Center, Jan. 1991.
[7] S.H. Bokhari, “Multiphase Complete Exchange on a Circuit Switched Hypercube,” Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 525529, 1991.
[8] S.H. Bokhari, "Multiphase Complete Exchange on Paragon, SP2, and CS2," IEEE Parallel and Distributed Technology, pp. 4559, Fall 1996.
[9] S.H. Bokhari, H. Berryman, "Complete Exchange on a Circuit Switched Mesh," Proc. Scalable High Performance Computing Conf., pp. 300306, 1992.
[10] E.A. Brewer and B.C. Kuszmaul, “How to Get Good Performance from the CM5 Data Network,” Proc. Int'l Parallel Processing Symp., 1994.
[11] V.V. Dimakopoulos and N.J. Dimopoulos, "Optimal Total Exchange in Linear Arrays and Rings," Proc. ISPAN'94 Int'l Symp. Parallel Architecture, Algorithms, and Networks, pp. 230237,Kanazawa, Japan, Dec. 1994.
[12] A. Edelman, “Optimal Matrix Transposition and Bit Reversal on Hypercubes: AlltoAll Personalized Communication,” J. Parallel and Distributed Computing, vol. 11, pp. 328331, 1991.
[13] Message Passing Interface Forum, “MPI: A MessagePassing Interface Standard,” technical report, Univ. of Tennessee, K noxville, June 1995. http://www.umiacs.umd.edu/research/EXPAR/ papers/3548.htmlhttp://www.mcs.anl.gov/ mpi/mpireport1.1mpireport.html.
[14] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[15] S. Gupta, S. Hawkinson, and B. Baxter, “A Binary Interleaved Algorithm for Complete Exchange on a Mesh Architecture,” technical report, Intel Corp., 1993. Personal communication.
[16] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,2491,268, Sept. 1989.
[17] M. Kaufmann,J. Sibeyn, and T. Suel,"Derandomizing Algorithms for Routing and Sorting on Meshes," Proc. Fifth Ann. ACMSIAM Symp. Discrete Algorithms, pp. 669679,Arlington, Va., 1994.
[18] Y.D. Lyuu and E. Schenfeld, “Total Exchange on a Reconfigurable Parallel Architecture,” Proc. Fifth IEEE Symp. Parallel and Distributed Processing, pp. 210, 1993.
[19] P.K. McKinley, Y.j. Tsai, and D.F. Robinson, “A Survey of Collective Communication in WormholeRouted Massively Parallel Computers,” Technical Report MSUCPS9435, Michigan State Univ., June 1994. ftp://ftp.cps.msu.edu/pub/crg/PAPERSmsucps9435.ps.Z .
[20] S.R. Öhring and S.K. Das, “Efficient Communication in the Folded Petersen Interconnection Networks,” Proc. Sixth Int'l Parallel Architectures and Languages Europe Conf., pp. 2536, 1994.
[21] T. Schmiermund and S.R. Seidel, “A Communication Model for the Intel iPSC/2,” Technical Report CSTR 9002, Dept. of Computer Science, Michigan Tech. Univ., Apr. 1990.
[22] H.D. Schwetman, “Introduction to ProcessOriented Simulation and CSIM,” Proc. Winter Simulation Conf., pp. 154157, Dec. 1990.
[23] D.S. Scott, "Efficient AlltoAll Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398403, 1991.
[24] T. Suel, “Routing and Sorting on Meshes with Row and Column Buses,” Technical Report UTA//CSTR9409, Dept. of Computer Sciences, Univ. of Texas at Austin, Oct. 1994.
[25] Y.J. Suh and S. Yalamanchili, "Algorithms for AlltoAll Personalized Exchange in 2D and 3D Tori," Proc. 10th Int'l Parallel Processing Symp., pp. 808814, Apr. 1996.
[26] N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan, "Complete Exchange in 2D Meshes," Proc. Scalable High Performance Computing Conf., pp. 406413, 1994.
[27] R. Take, “A Routing Method for AlltoAll Burst on Hypercube Networks,” Proc. 35th Nat'l Conf. Information Processing Soc. of Japan, pp. 151152, 1987. (In Japanese).
[28] R. Thakur and A. Choudhary, "AlltoAll Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Parallel Processing Symp., pp. 561565, Apr. 1994.
[29] R. Thakur, A. Choudhary, and G. Fox, “Complete Exchange on a Wormhole Routed Mesh,” Technical Report SCCS505, Northeast Parallel Architectures Center, Syracuse Univ., July 1993. http://www.npac.syr.edu/techreports/html/ 0500abs0505.html.
[30] R. Thakur, R. Ponnusamy, A. Choudhary, and G. Fox, “Complete Exchange on the CM5 and Touchstone Delta,” J. Supercomputing, vol. 8, pp. 305328, 1995.