This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach
April 1997 (vol. 8 no. 4)
pp. 380-396

Abstract—All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.

[1] G. Bilardi and F.P. Preparata, "Horizons of Parallel Computation," J. Parallel and Distributed Computing, vol. 27, pp. 172-182, 1996.
[2] S.H. Bokhari, H. Berryman, "Complete Exchange on a Circuit Switched Mesh," Proc. Scalable High Performance Computing Conf., pp. 300-306, 1992.
[3] S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H.T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P.S. Tseng, J. Sutton, J. Urbanski, and J. Webb iWarp: An Integrated Solution to High-Speed Parallel Computing, Proc. 1988 Int'l Conf. Supercomputing, pp. 330-339., IEEE CS and ACM SIGARCH, Orlando, Fla., Nov. 1988.
[4] Cray T3D System Architecture Overview. Cray Research, Inc., 1993.
[5] W.J. Dally, R. Davison, J.A.S. Fiske, G. Fyler, J.S. Keen, R.A. Lethin, M. Noakes, and P.R. Nuth, "The J-Machine: A Fine-Grain Concurrent Computer," Proc. Information Processing 89, IFIP, pp. 1,147-1,153, 1989.
[6] W.J. Dally and C.L. Seitz, "The Torus Routing Chip," J. Parallel and Distributed Computing, vol. 1, no. 3, pp. 187-196, 1986.
[7] I.T. Foster, Designing and Building Parallel Programs Addison-Wesley, Reading, Mass., 1995.
[8] P. Fragopoulou and S.G. Akl, "A Framework for Optimal Communication on the Multidimensional Torus Network," Technical Report 94-363, Dept. of Computing and Information Science, Queen's Univ., 1994.
[9] S. Gupta, S. Hawkinson, and B. Baxter, "A Binary Interleaved Algorithm for Complete Exchange on a Mesh Architecture," technical report, Intel Corp., 1994.
[10] S. Hinrichs, C. Kosak, D.R. O'Hallaron, T.M. Sticker, and R. Take, "An Architecture for Optimal All-to-All Personalized Communication," Proc. Symp. Parallel Algorithms and Architectures, pp. 310-319, 1994.
[11] H. Li and M. Maresca,“Polymorphic-torus network,” IEEE Trans. on Computers, vol. 38, no. 9, pp. 1345-1351, Sept. 1989.
[12] M. Lin, R.P. Tsang, and D. Du, "Performance Characteristics of the Connection Machine Hypertree Network," J. Parallel and Distributed Computing, vol. 19, pp. 245-254, 1993.
[13] MP-1 Family Data-Parallel Computers. MasPar Computer Co.
[14] MPI: A Message-Passing Interface Standard. Message Passing Interface Forum, May 1994.
[15] L.M. Ni and P.K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," Computer, vol. 26, no. 2, pp. 62-76, Feb. 1993.
[16] W. Oed, Massively Parallel Processor System Cray T3D. Cray Research GmbH, 1993.
[17] D.S. Scott, "Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies," Proc. Sixth Conf. Distributed Memory Concurrent Computers, pp. 398-403, 1991.
[18] S.R. Seidel, "Circuit Switched vs. Store-and-Forward Solutions to Symmetric Communication Problems," Proc. Fourth Conf. Hypercube Concurrent Computers and Applications, pp. 253-255, 1989.
[19] N.S. Sundar, D.N. Jayasimha, D.K. Panda, and P. Sadayappan, "Complete Exchange in 2D Meshes," Proc. Scalable High Performance Computing Conf., pp. 406-413, 1994.
[20] M.R. Thistle and B.J. Smith, "A Processor Architecture for Horizon," Proc. Supercomputing, pp. 35-41, 1988.
[21] Y.-C. Tseng and S. Gupta, “All-to-All Personalized Communication in a Wormhole-Routed Torus,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 498-505, May 1996.
[22] Y.-C. Tseng, S. Gupta, and D. Panda, "An Efficient Scheme for Complete Exchange in 2D Tori," Proc. Int'l Parallel Processing Symp. pp. 532-536, 1995.

Index Terms:
Collective communication, complete exchange, distributed memory systems, interprocessor communication, parallel computing, torus, wormhole routing.
Citation:
Yu-Chee Tseng, Ting-Hsien Lin, Sandeep K. S. Gupta, Dhabaleswar K. Panda, "Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 4, pp. 380-396, April 1997, doi:10.1109/71.588613
Usage of this product signifies your acceptance of the Terms of Use.