This Article 
 Bibliographic References 
 Add to: 
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks
June 1998 (vol. 9 no. 6)
pp. 526-534

Abstract—In this paper, we consider a tree-based routing scheme for supporting barrier synchronization on scalable parallel computers with a 2D mesh network. Based on the characteristics of a standard programming interface, the scheme builds a collective synchronization (CS) tree among the participating nodes using a distributed algorithm. When the routers are set up properly with the CS tree information, barrier synchronization can be accomplished very efficiently by passing simple messages. Performance evaluations show that our proposed method performs better than previous path-based approaches and is less sensitive to variations in group size and startup delay. However, our scheme has the extra overhead of building the CS tree. Thus, it is more suitable for parallel iterative computations in which the same barrier is invoked repetitively.

[1] J.B. Andrews, C.J. Beckmann, and D.K. Poulsen, "Notification and Multicast Networks for Synchronization and Coherence," J. Parallel and Distributed Computing, pp. 332-350, 1992.
[2] V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C.-T. Ho, S. Kipnis, and M. Snir, "CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 2, pp. 154-164, Feb. 1995.
[3] C.J. Beckmann and C.D. Polychronopoulos, "Broadcast Networks for Fast Synchronization," Proc. 1991 Int'l Conf. Parallel Processing, vol. I, pp. 220-224, 1991.
[4] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[5] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[6] K.B. Fan and C.T. King, "Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks," Proc. 25th Int'l Conf. Parallel Processing, Aug. 1997.
[7] A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer," IEEE Trans. Computers, vol. 32, no. 2, pp. 175-189, Feb. 1983.
[8] P. Kermani and L. Kleinrock, "Virtual Cut-Through: A New Computer Communication Switching Technique," Computer Networks, vol. 3, no. 4, pp. 267-286, 1979.
[9] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.-W. Yang,, and R. Zak,“The network architecture of the connection machine CM-5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272-285, June 1992.
[10] X. Lin, P.K. McKinley,, and L.M. Ni,"Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, Aug. 1994, pp. 793-804.
[11] W. Liu, V. Lo, K. Windisch, and B. Nitzberg, "Non-Contiguous Processor Allocation Algorithm for Distributed Memory Multicomputers," Proc. Supercomputing '94, pp. 227-236,Washington D. C., 1994.
[12] P.K. McKinley et al., "Unicast-Based Multicast Communication in Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, Dec. 1994, pp. 1252-1265.
[13] J. M. Mellor-Crummey and M. L. Scott,“Algorithms for scalable synchronization on shared-memory multiprocessors,”ACM Trans. Comput. Syst., vol, 9, no. 1, pp. 21–65, Feb. 1991.
[14] PVM-3 User's Guide and Reference Manual. Oak Ridge Nat'l Laboratory, May 1993.
[15] D.K. Panda, "Fast Barrier Synchronization in Wormhole k-Ary n-Cube Networks with Multidestination Worms," Proc. Int'l Symp. High Performance Computer Architecture, pp. 200-209, 1995.
[16] D.K. Panda, "Global Reduction in Wormhole k-Ary n-Cube Networks with Multidestination Exchange Worms," Proc. Int'l Parallel Processing Symp., pp. 652-659, Apr. 1995.
[17] G.F. Pfister and V.A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks," IEEE Trans. Computers, vol. 34, no. 10, pp. 943-948, Oct. 1985.
[18] S.S. Shang and K. Hwang, "Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 6, pp. 591-605, June 1995.
[19] H.S. Stone, High-Performance Computer Architecture.Reading, Mass.: Addison-Wesley, 1990.
[20] Paragon XP/S Product Overview.Beaverton, Ore.: Supercomputer Systems Division, Intel Corp., 1991.
[21] MPI: A Message-Passing Interface Standard. Univ. of Tennessee, Mar. 1994.
[22] H. Xu, P.K. McKlinley, and L.M. Ni, "Efficient Implementation of Barrier Synchronization in Wormhole-Routed Hypercube Multicomputers," J. Parallel and Distributed Computing, vol. 16, pp. 172-184, Oct. 1992.
[23] P. C. Yew, N. F. Tzeng, and D. H. Lawrie,“Distributing hot-spot addressing in large-scale multiprocessors,”IEEE Trans. Comput., vol. C-36, pp. 388–395, Apr. 1987.

Index Terms:
Barrier synchronization, collective communication, interconnection network, message passing interface, multicast.
Jenq-Shyan Yang, Chung-Ta King, "Designing Tree-Based Barrier Synchronization on 2D Mesh Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 6, pp. 526-534, June 1998, doi:10.1109/71.689440
Usage of this product signifies your acceptance of the Terms of Use.