This Article 
 Bibliographic References 
 Add to: 
Hybrid Algorithms for Complete Exchange in 2D Meshes
December 2001 (vol. 12 no. 12)
pp. 1201-1218

Parallel algorithms for several common problems such as sorting and the FFT involve a personalized exchange of data among all the processors. Past approaches to doing complete exchange have taken one of two broad approaches: direct exchange or the indirect message-combining approaches. While combining approaches reduce the number of message startups, direct exchange minimizes the volume of data transmitted. This paper presents a family of hybrid algorithms for wormhole-routed 2D meshes that can effectively utilize the complementary strengths of these two approaches to complete exchange. The performance of hybrid algorithms using Cyclic Exchange and Scott's Direct Exchange are studied using analytical models, simulation, and implementation on a Cray T3D system. The results show that hybrids achieve lower completion times than either pure algorithm for a range of mesh sizes, data block sizes, and message startup costs. It is also demonstrated that barriers may be used to enhance performance by reducing message contention, whether or not the target system provides hardware support for barrier synchronization. The analytical models are shown useful in selecting the optimum hybrid for any given combination of system parameters (mesh size, message startup time, flit transfer time, and barrier cost) and the problem parameter (data block size).

[1] B. Abali, F. Özgüner, and A. Bataineh, “Balanced Parallel Sort on Hypercube Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 5, pp. 572-581, May 1993.
[2] D.A. Bader, D.R. Helman, and J. JáJá, “Practical Parallel Algorithms for Personalized Communication and Integer Sorting,” Technical Report UMIACS TR 95-101, Inst. for Advanced Computer Studies, Univ. of Maryland, 1995. .
[3] V. Bala,J. Bruck,R. Cypher,P. Elustondo,A. Ho,C.T. Ho,S. Kipnis,, and M. Snir,“CCL: A portable and tunable collective communication library forscalable parallel computers,” Eighth Int’l Parallel Processing Symp., IEEE, pp. 835-844, Apr. 1994.
[4] M. Barnett, S. Gupta, D.G. Payne, L. Shuler, R. van de Geijn, and J. Watts, “Building a High-Performance Collective Communication Library,” Proc. Scalable High Performance Computing Conf., pp. 835-844, 1994.
[5] D. Bertsekas, C. Ozveren, G. Stamoulis, P. Tseng, and J. Tsitsiklis, "Optimal Communication Algorithms for Hypercubes," J. Parallel and Distributed Computing, vol. 11, pp. 263-275, 1991.
[6] S.H. Bokhari, “Complete Exchange on the iPSC-860,” Technical Report 91-4, Inst. for Computer Applications in Science and Eng., NASA Langley Research Center, Jan. 1991.
[7] S.H. Bokhari, “Multiphase Complete Exchange on a Circuit Switched Hypercube,” Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 525-529, 1991.
[8] S.H. Bokhari, "Multiphase Complete Exchange on Paragon, SP2, and CS-2," IEEE Parallel and Distributed Technology, pp. 45-59, Fall 1996.
[9] S.H. Bokhari, H. Berryman, "Complete Exchange on a Circuit Switched Mesh," Proc. Scalable High Performance Computing Conf., pp. 300-306, 1992.
[10] E.A. Brewer and B.C. Kuszmaul, “How to Get Good Performance from the CM-5 Data Network,” Proc. Int'l Parallel Processing Symp., 1994.
[11] V.V. Dimakopoulos and N.J. Dimopoulos, "Optimal Total Exchange in Linear Arrays and Rings," Proc. ISPAN'94 Int'l Symp. Parallel Architecture, Algorithms, and Networks, pp. 230-237,Kanazawa, Japan, Dec. 1994.
[12] A. Edelman, “Optimal Matrix Transposition and Bit Reversal on Hypercubes: All-to-All Personalized Communication,” J. Parallel and Distributed Computing, vol. 11, pp. 328-331, 1991.
[13] Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard,” technical report, Univ. of Tennessee, K noxville, June 1995. papers/3548.html mpi/mpi-report-1.1mpi-report.html.
[14] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[15] S. Gupta, S. Hawkinson, and B. Baxter, “A Binary Interleaved Algorithm for Complete Exchange on a Mesh Architecture,” technical report, Intel Corp., 1993. Personal communication.
[16] S.L. Johnsson and C.T. Ho,“Spanning graphs for optimum broadcasting and personalizedcommunication in hypercubes,” IEEE Trans. Computers, vol. 38, no. 9, pp. 1,249-1,268, Sept. 1989.
[17] M. Kaufmann,J. Sibeyn, and T. Suel,"Derandomizing Algorithms for Routing and Sorting on Meshes," Proc. Fifth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 669-679,Arlington, Va., 1994.
[18] Y.-D. Lyuu and E. Schenfeld, “Total Exchange on a Reconfigurable Parallel Architecture,” Proc. Fifth IEEE Symp. Parallel and Distributed Processing, pp. 2-10, 1993.
[19] P.K. McKinley, Y.-j. Tsai, and D.F. Robinson, “A Survey of Collective Communication in Wormhole-Routed Massively Parallel Computers,” Technical Report MSU-CPS-94-35, Michigan State Univ., June 1994. .
[20] S.R. Öhring and S.K. Das, “Efficient Communication in the Folded Petersen Interconnection Networks,” Proc. Sixth Int'l Parallel Architectures and Languages Europe Conf., pp. 25-36, 1994.
[21] T. Schmiermund and S.R. Seidel, “A Communication Model for the Intel iPSC/2,” Technical Report CS-TR 9002, Dept. of Computer Science, Michigan Tech. Univ., Apr. 1990.
[22] H.D. Schwetman, “Introduction to Process-Oriented Simulation and CSIM,” Proc. Winter Simulation Conf., pp. 154-157, Dec. 1990.
[23] D.S. Scott, "Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398-403, 1991.
[24] T. Suel, “Routing and Sorting on Meshes with Row and Column Buses,” Technical Report UTA//CS-TR-94-09, Dept. of Computer Sciences, Univ. of Texas at Austin, Oct. 1994.
[25] Y.J. Suh and S. Yalamanchili, "Algorithms for All-to-All Personalized Exchange in 2D and 3D Tori," Proc. 10th Int'l Parallel Processing Symp., pp. 808-814, Apr. 1996.
[26] N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan, "Complete Exchange in 2D Meshes," Proc. Scalable High Performance Computing Conf., pp. 406-413, 1994.
[27] R. Take, “A Routing Method for All-to-All Burst on Hypercube Networks,” Proc. 35th Nat'l Conf. Information Processing Soc. of Japan, pp. 151-152, 1987. (In Japanese).
[28] R. Thakur and A. Choudhary, "All-to-All Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Parallel Processing Symp., pp. 561-565, Apr. 1994.
[29] R. Thakur, A. Choudhary, and G. Fox, “Complete Exchange on a Wormhole Routed Mesh,” Technical Report SCCS-505, Northeast Parallel Architectures Center, Syracuse Univ., July 1993. 0500abs-0505.html.
[30] R. Thakur, R. Ponnusamy, A. Choudhary, and G. Fox, “Complete Exchange on the CM-5 and Touchstone Delta,” J. Supercomputing, vol. 8, pp. 305-328, 1995.

Index Terms:
Collective communication, complete exchange, combining, direct exchange, hybrid algorithms, message contention, barrier synchronization, mesh topology, wormhole routing
N.S. Sundar, D.N. Jayasimha, D.K. Panda, P. Sadayappan, "Hybrid Algorithms for Complete Exchange in 2D Meshes," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 12, pp. 1201-1218, Dec. 2001, doi:10.1109/71.970553
Usage of this product signifies your acceptance of the Terms of Use.