Issue No.09 - September (2010 vol.59)
pp: 1187-1199
Igor Valerievich Zotov , Kursk Technical University, Kursk
The work presents a distributed hardware-level barrier mechanism for n-dimensional mesh-connected MIMD computers, called Distributed Virtual Bit-Slice Synchronizer (DVBSS). The proposed mechanism is structured around an m-bit dedicated control network, whose topology is a directed mesh-embeddable graph, with an additional m-bit-wide wraparound connection. By using a specific virtualization scheme making it possible to have p virtual m-bit barrier networks superposed on a physical one, the DVBSS model allows to synchronize more than m barrier groups. To minimize synchronization latency, the DVBSS scheme uses a distributed circulating wave clocking (DCW-clocking) technique to switch between virtual barrier networks in a pipeline fashion. The DVBSS scheme is shown to be general, configurable, and MPI-compatible. Unlike proposed distributed hardware barriers, and hardware tree-based schemes, the DVBSS mechanism accepts dynamically defined (possibly overlapping) barrier groups of arbitrary size and shape, allowing noncontiguous group member allocations.
Barrier synchronization, dedicated barrier networks, hardware barriers, mesh-connected parallel computers.
Igor Valerievich Zotov, "Distributed Virtual Bit-Slice Synchronizer: A Scalable Hardware Barrier Mechanism for n-Dimensional Meshes", IEEE Transactions on Computers, vol.59, no. 9, pp. 1187-1199, September 2010, doi:10.1109/TC.2010.15
[1] T.S. Axelrod, "Effects of Synchronization Barriers on Multiprocessor Performance," Parallel Computing, vol. 3, pp. 129-140, 1986.
[2] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 1.1, June 1995.
[3] OpenMP Application Programming Interface, Version 2.5, May 2005.
[4] R. Gupta and C.R. Hill, "A Scalable Implementation of Barrier Synchronization Using an Adaptive Combining Tree," Int'l J. Parallel Programming, vol. 18, no. 3, pp. 161-180, June 1989.
[5] C.J. Beckmann and C.D. Polychronopoulos, "Fast Barrier Synchronization Hardware," Proc. Int'l Conf. Supercomputing, pp. 180-189, 1990.
[6] M.T. O'Keefe and H.G. Dietz, "Hardware Barrier Synchronization: Dynamic Barrier MIMD (DBM)," Proc. Int'l Conf. Parallel Processing, pp. 43-46, 1990.
[7] K. Hwang and S.S. Shang, "Wired-NOR Barrier Synchronization for Designing Large Shared-Memory Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 171-175, 1991.
[8] M.L. Scott and J.M. Mellor-Crummey, "Fast, Contention-Free Combining Tree Barriers for Shared Memory Multiprocessors," Int'l J. Parallel Programming, vol. 22, no. 4, pp. 449-481, Aug. 1994.
[9] D. Johnson, D. Lilja, and J. Riedl, "A Distributed Hardware Mechanism for Process Synchronization on Shared-Bus Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 268-275, 1994.
[10] H.G. Dietz, T. Muhammad, J.B. Sponaugle, and T. Mattox, "PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization," Technical Report TR-EE 94-11, School of Electrical Eng., Purdue Univ., Mar. 1994.
[11] S.S. Shang and K. Hwang, "Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 6, pp. 591-605, June 1995.
[12] D.K. Panda, "Fast Barrier Synchronization in Wormhole k-ary n-Cube Networks," Proc. First IEEE Symp. High-Performance Computer Architecture, pp. 200-209, 1995.
[13] H.T. Olnowich, "ALLNODE Barrier Synchronization Network," Proc. Ninth Int'l Parallel Processing Symp., pp. 265-269, 1995.
[14] M. Delgado and S. Kofuji, "A Distributed Barrier Synchronization Solution in Hardware for 2D-Mesh Multicomputers," Proc. Third Int'l Conf. High Performance Computing, pp. 368-373, 1996.
[15] R. Hoare, H. Dietz, T. Mattox, and S. Kim, "Bitwise Aggregate Networks," Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 306-313, 1996.
[16] K.B. Fan and C.T. King, "Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks," Proc. 25th Int'l Conf. Parallel Processing, pp. 190-197, 1997.
[17] L.I. Kontothanassis, R.W. Wisniewski, and M.L. Scott, "Scheduler-Conscious Synchronization," ACM Trans. Computer Systems, vol. 15, no. 1, pp. 3-40, Feb. 1997.
[18] R. Sivaram, C.B. Stunkel, and D.K. Panda, "A Reliable Hardware Barrier Synchronization Scheme," Proc. 11th Int'l Parallel Processing Symp., pp. 274-280, 1997.
[19] J.S. Yang and C.T. King, "Designing Tree-Based Barrier Synchronization on 2D Mesh Networks," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 526-533, June 1998.
[20] V. Ramakrishnan, I.D. Scherson, and R. Subramanian, "Efficient Techniques for Nested and Disjoint Barrier Synchronization," J. Parallel and Distributed Computing, vol. 58, no. 8, pp. 333-356, 1999.
[21] W.E. Cohen, D.W. Hyde, and R.K. Gaede, "An Optical Bus-Based Distributed Dynamic Barrier Mechanism," IEEE Trans. Computers, vol. 49, no. 12, pp. 1354-1365, Dec. 2000.
[22] S. Moh, C. Yu, B. Lee, H.Y. Youn, D. Han, and D. Lee, "Four-ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement," IEEE Trans. Computers, vol. 50, no. 8, pp. 811-823, Aug. 2001.
[23] T.A. Johnson and R.R. Hoare, "Cyclical Cascade Chains: A Dynamic Barrier Synchronization Mechanism for Multiprocessor Systems," Proc. 15th Int'l Parallel and Distributed Processing Symp., pp. 2061-2068, 2001.
[24] Y. Sun, P.Y.S. Cheung, and X. Lin, "Barrier Synchronization on Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 583-597, Dec. 2001.
[25] I. Jung, J. Hyun, J. Lee, and J. Ma, "Two-Phase Barrier: A Synchronization Primitive for Improving the Processor Utilization," Int'l J. Parallel Programming, vol. 29, no. 6, pp. 607-627, Dec. 2001.
[26] D. Tsafrir and D.G. Feitelson, "Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms," Proc. Int'l Parallel and Distributed Processing Symp., pp. 80-87, 2002.
[27] S.P. Kini, J. Liu, J. Wu, P. Wyckoff, and D.K. Panda, "Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for Infiniband-Based Clusters," Proc. 10th European Parallel Virtual Machine (PVM)/Message Passing Interface (MPI) Users' Group Conf., pp. 369-378, 2003.
[28] M. Forsell, "Efficient Barrier Synchronization Mechanism for Emulated Shared Memory NOCs," Proc. Int'l Symp. System-on-Chip, pp. 33-36, 2004.
[29] J. Li, J.F. Martinez, and M.C. Huang, "The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors," Proc. 10th Int'l Symp. High Performance Computer Architecture, pp. 14-23, 2004.
[30] T. Hindam, "Connecting the Distributed Hardware Agents for Barrier Synchronization Operation," Proc. Int'l Conf. Electrical, Electronic and Computer Eng., pp. 261-264, 2004.
[31] N.-T. Tzeng, B. Kasula, and W. Hongyi, "Efficient Barrier Synchronization on Wireless Computing Systems," Proc. 11th Int'l Conf. Parallel and Distributed Systems, pp. 782-788, 2005.
[32] Z. Fang, L. Zhang, J.B. Carter, L. Cheng, and M. Parker, "Fast Synchronization on Shared-Memory Multiprocessors: An Architectural Approach," J. Parallel and Distributed Computing, vol. 65, no. 10, pp. 1158-1170, Oct. 2005.
[33] J. Sampson, R. González, J.-F. Collard, N.P. Jouppi, M. Schlansker, and B. Calder, "Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 235-246, 2006.
[34] S. Bai, Q. Zhou, R. Zhou, and L. Li, "Barrier Synchronization for CELL Multi-Processor Architecture," Proc. First IEEE Int'l Conf. Ubi-Media Computing, pp. 155-158, 2008.
[35] B. Beck, B. Kasten, and S. Thakkar, "VLSI Assist for a Multiprocessor," Proc. Second Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 10-20, 1987.
[36] Connection Machine CM-5 Technical Summary, Thinking Machines Corp., Nov. 1992.
[37] Cray T3D System Architecture Overview, Cray Research, Inc., 1993.
[38] T. Shimizu, "Performance Evaluation of the AP1000," FUJITSU Scientific and Technical J., vol. 29, pp. 15-24. Mar. 1993.
[39] Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, July 1997.
[40] TILE64 Processor, Tilera Corp., http:/, 2007.