This Article 
 Bibliographic References 
 Add to: 
Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement
August 2001 (vol. 50 no. 8)
pp. 811-823

Abstract—This paper proposes a Barrier Tree for Meshes (BTM) to minimize the barrier synchronization latency for two-dimensional (2D) meshes. The proposed BTM scheme has two distinguishing features. First, the synchronization tree is 4-ary. The synchronization latency of the BTM scheme is asymptotically $\Theta (\log_{4} n)$, while that of the fastest scheme reported in the literature is bounded between $\Omega (\log_{3} n)$ and $O (n^{1/2})$, where $n$ is the number of member nodes. Second, nonmember nodes are neither involved in the construction of a BTM nor actively participate in the synchronization operations, which avoids interference among different process groups during synchronization. This not only results in low setup overhead, but also reduces the synchronization latency. The low setup overhead is particularly effective for the dynamic process model provided in MPI-2. Extensive simulation study shows that, for up to $64 \times 64$ meshes, the BTM scheme results in about $40 \sim 70$ percent shorter synchronization latency and is more scalable than conventional schemes.

[1] J. Miguel, A. Arruabarrena, R. Beivide, and J.A. Gregorio, “Assessing the Performance of the New IBM SP2 Communication Subsystem,” IEEE Parallel and Distributed Technology, vol. 4, no. 4, pp. 12-22, Winter 1996.
[2] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 1.1, June 1995.
[3] Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, July 1997.
[4] R. Gupta, “The Fuzzy Barrier: A Mechanism for the High Speed Synchronization of Processors,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 54-63, 1989.
[5] R. Gupta and C.R. Hill, “A Scalable Implementation of Barrier Synchronization Using an Adaptive Combining Tree,” Int'l J. Parallel Programming, vol. 18, no. 3, pp. 161-180, June 1989.
[6] H. Xu, P.K. McKinley, and L.M. Ni, “Efficient Implementation of Barrier Synchronization in Wormhole-Routed Hypercube Multicomputers,” J. Parallel and Distributed Computing, vol. 16, pp. 172-184, 1992.
[7] P.K. McKinley, H. Xu, A.-H. Esfahanian, and L.M. Ni, “Unicast-Based Multicast Communication in Wormhole-Routed Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, pp. 1252-1265, Dec. 1994.
[8] C.J. Beckmann and C.D. Polychronopoulos, “Fast Barrier Synchronization Hardware,” Proc. Supercomputing '90, pp. 180-189, Nov. 1990.
[9] M.T. O'Keefe and H.D. Dietz, “Hardware Barrier Synchronization: Dynamic Barrier MIMD (DBM),” Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 43-46, Aug. 1990.
[10] X. Lin, P.K. McKinley, and L.M. Ni, “Deadlock-Free Multicast Wormhole Routing in 2D Mesh Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, pp. 793-804, Aug. 1994.
[11] D.K. Panda, “Fast Barrier Synchronization in Wormhole k-Ary n-Cube Networks,” Proc. First IEEE Symp. High-Performance Computer Architecture, pp. 200-209, Jan. 1995.
[12] J.-S. Yang and C.-T. King, “Designing Tree-Based Barrier Synchronization on 2D Mesh Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 526-534, June 1998.
[13] V. Ramakrishnan, I.D. Scherson, and R. Subramanian, “Efficient Techniques for Nested and Disjoint Barrier Synchronization,” J. Parallel and Distributed Computing, vol. 58, pp. 333-356, Aug. 1999.
[14] P.K. McKinley, Y.-J. Tsai, and D.F. Robinson, “Collective Communication in Wormhole-Routed Massively Parallel Computers,” Computer, vol. 28, no. 12, pp. 39-50, Dec. 1995.
[15] L. Ni and P.K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks,” Computer, vol. 23, no. 2, pp. 62-76, Feb. 1993.
[16] J. Duato, S. Yalamanchile, and L. Ni, Interconnection Networks: An Engineering Approach, pp. 175-226. Los Alamitos, Calif.: IEEE CS Press, 1997.
[17] P. Mohapatra, “Wormhole Routing Techniques for Directly Connected Multicomputer Systems,” ACM Computing Surveys, vol. 30, no. 3, pp. 374-410, Sept. 1998.
[18] Oak Ridge Nat'l Laboratory, PVM: Parallel Virtual Machine, http://www.epm.ornl.govpvm/, July 2000.
[19] D.E. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach. San Francisco: Morgan Kaufmann, 1999.
[20] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, vol. 36, no. 5, pp. 547-553, May 1987.
[21] N. Boden et al., “Myrinet: A Gigabit-per-Second Local Area Network,” IEEE Micro, vol. 15, no. 1, pp. 29-36, Feb. 1995.
[22] M.D. Schroeder et al., “Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links,” SRC Research Report no. 59, Digital Equipment Corp., Apr. 1990.
[23] H. Chen and P. Wyckoff, “Simulation Studies of Gigabit Ethernet versus Myrinet Using Real Application Cores,” Proc. Fourth Workshop Comm., Architecture, and Applications for Network-Based Parallel Computing, Jan. 2000.
[24] S. Moh, C. Yu, H.Y. Youn, D. Han, B. Lee, and D. Lee, “A Fast Tree-Based Barrier Synchronization on Switch-Based Irregular Networks,” Proc. Seventh Int'l Conf. High Performance Computing, Dec. 2000.
[25] R. Buyya, High Performance Cluster Computing: Architectures and Systems, chapter 26. Englewood Cliffs, N.J.: Prentice Hall, 1999.
[26] R. Buyya, High Performance Cluster Computing: Architectures and Systems, chapter 27. Englewood Cliffs, N.J.: Prentice-Hall, 1999.
[27] P. Pacheco, Parallel Programming with MPI. San Francisco: Morgan Kaufmann, 1997.
[28] A.A. Chien, “A Cost and Speed Model for k-Ary n-Cube Wormhole Routers,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, pp. 150-162, Feb. 1998.

Index Terms:
Barrier synchronization, hardware-supported barriers, communication latency, wormhole routing, MPI.
S. Moh, C. Yu, B. Lee, H.Y. Youn, D. Han, D. Lee, "Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement," IEEE Transactions on Computers, vol. 50, no. 8, pp. 811-823, Aug. 2001, doi:10.1109/12.947001
Usage of this product signifies your acceptance of the Terms of Use.