The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new <it>diagonal-propagation approach</it> is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.</p>
Collective communication, complete exchange, distributed memory systems, interprocessor communication, parallel computing, torus, wormhole routing.
Yu-Chee Tseng, Sandeep K. S. Gupta, Ting-Hsien Lin, Dhabaleswar K. Panda, "Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach", IEEE Transactions on Parallel & Distributed Systems, vol. 8, no. , pp. 380-396, April 1997, doi:10.1109/71.588613
80 ms
(Ver 3.3 (11022016))