The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—This paper proposes a <it>Barrier Tree for Meshes</it> (<it>BTM</it>) to minimize the barrier synchronization latency for two-dimensional (2D) meshes. The proposed BTM scheme has two distinguishing features. First, the synchronization tree is 4-ary. The synchronization latency of the BTM scheme is asymptotically <tmath>$\Theta (\log_{4} n)$</tmath>, while that of the fastest scheme reported in the literature is bounded between <tmath>$\Omega (\log_{3} n)$</tmath> and <tmath>$O (n^{1/2})$</tmath>, where <tmath>$n$</tmath> is the number of member nodes. Second, nonmember nodes are neither involved in the construction of a BTM nor actively participate in the synchronization operations, which avoids interference among different process groups during synchronization. This not only results in low setup overhead, but also reduces the synchronization latency. The low setup overhead is particularly effective for the dynamic process model provided in MPI-2. Extensive simulation study shows that, for up to <tmath>$64 \times 64$</tmath> meshes, the BTM scheme results in about <tmath>$40 \sim 70$</tmath> percent shorter synchronization latency and is more scalable than conventional schemes.</p>
Barrier synchronization, hardware-supported barriers, communication latency, wormhole routing, MPI.

S. Moh, D. Lee, B. Lee, D. Han, H. Youn and C. Yu, "Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement," in IEEE Transactions on Computers, vol. 50, no. , pp. 811-823, 2001.
94 ms
(Ver 3.3 (11022016))