The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2010 vol.59)
pp: 1187-1199
Igor Valerievich Zotov , Kursk Technical University, Kursk
ABSTRACT
The work presents a distributed hardware-level barrier mechanism for n-dimensional mesh-connected MIMD computers, called Distributed Virtual Bit-Slice Synchronizer (DVBSS). The proposed mechanism is structured around an m-bit dedicated control network, whose topology is a directed mesh-embeddable graph, with an additional m-bit-wide wraparound connection. By using a specific virtualization scheme making it possible to have p virtual m-bit barrier networks superposed on a physical one, the DVBSS model allows to synchronize more than m barrier groups. To minimize synchronization latency, the DVBSS scheme uses a distributed circulating wave clocking (DCW-clocking) technique to switch between virtual barrier networks in a pipeline fashion. The DVBSS scheme is shown to be general, configurable, and MPI-compatible. Unlike proposed distributed hardware barriers, and hardware tree-based schemes, the DVBSS mechanism accepts dynamically defined (possibly overlapping) barrier groups of arbitrary size and shape, allowing noncontiguous group member allocations.
INDEX TERMS
Barrier synchronization, dedicated barrier networks, hardware barriers, mesh-connected parallel computers.
CITATION
Igor Valerievich Zotov, "Distributed Virtual Bit-Slice Synchronizer: A Scalable Hardware Barrier Mechanism for n-Dimensional Meshes", IEEE Transactions on Computers, vol.59, no. 9, pp. 1187-1199, September 2010, doi:10.1109/TC.2010.15
REFERENCES
[1] T.S. Axelrod, "Effects of Synchronization Barriers on Multiprocessor Performance," Parallel Computing, vol. 3, pp. 129-140, 1986.
[2] Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 1.1, June 1995.
[3] OpenMP Application Programming Interface, Version 2.5, May 2005.
[4] R. Gupta and C.R. Hill, "A Scalable Implementation of Barrier Synchronization Using an Adaptive Combining Tree," Int'l J. Parallel Programming, vol. 18, no. 3, pp. 161-180, June 1989.
[5] C.J. Beckmann and C.D. Polychronopoulos, "Fast Barrier Synchronization Hardware," Proc. Int'l Conf. Supercomputing, pp. 180-189, 1990.
[6] M.T. O'Keefe and H.G. Dietz, "Hardware Barrier Synchronization: Dynamic Barrier MIMD (DBM)," Proc. Int'l Conf. Parallel Processing, pp. 43-46, 1990.
[7] K. Hwang and S.S. Shang, "Wired-NOR Barrier Synchronization for Designing Large Shared-Memory Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 171-175, 1991.
[8] M.L. Scott and J.M. Mellor-Crummey, "Fast, Contention-Free Combining Tree Barriers for Shared Memory Multiprocessors," Int'l J. Parallel Programming, vol. 22, no. 4, pp. 449-481, Aug. 1994.
[9] D. Johnson, D. Lilja, and J. Riedl, "A Distributed Hardware Mechanism for Process Synchronization on Shared-Bus Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 268-275, 1994.
[10] H.G. Dietz, T. Muhammad, J.B. Sponaugle, and T. Mattox, "PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization," Technical Report TR-EE 94-11, School of Electrical Eng., Purdue Univ., Mar. 1994.
[11] S.S. Shang and K. Hwang, "Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 6, pp. 591-605, June 1995.
[12] D.K. Panda, "Fast Barrier Synchronization in Wormhole k-ary n-Cube Networks," Proc. First IEEE Symp. High-Performance Computer Architecture, pp. 200-209, 1995.
[13] H.T. Olnowich, "ALLNODE Barrier Synchronization Network," Proc. Ninth Int'l Parallel Processing Symp., pp. 265-269, 1995.
[14] M. Delgado and S. Kofuji, "A Distributed Barrier Synchronization Solution in Hardware for 2D-Mesh Multicomputers," Proc. Third Int'l Conf. High Performance Computing, pp. 368-373, 1996.
[15] R. Hoare, H. Dietz, T. Mattox, and S. Kim, "Bitwise Aggregate Networks," Proc. Eighth IEEE Symp. Parallel and Distributed Processing, pp. 306-313, 1996.
[16] K.B. Fan and C.T. King, "Turn Grouping for Efficient Barrier Synchronization in Wormhole Mesh Networks," Proc. 25th Int'l Conf. Parallel Processing, pp. 190-197, 1997.
[17] L.I. Kontothanassis, R.W. Wisniewski, and M.L. Scott, "Scheduler-Conscious Synchronization," ACM Trans. Computer Systems, vol. 15, no. 1, pp. 3-40, Feb. 1997.
[18] R. Sivaram, C.B. Stunkel, and D.K. Panda, "A Reliable Hardware Barrier Synchronization Scheme," Proc. 11th Int'l Parallel Processing Symp., pp. 274-280, 1997.
[19] J.S. Yang and C.T. King, "Designing Tree-Based Barrier Synchronization on 2D Mesh Networks," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 526-533, June 1998.
[20] V. Ramakrishnan, I.D. Scherson, and R. Subramanian, "Efficient Techniques for Nested and Disjoint Barrier Synchronization," J. Parallel and Distributed Computing, vol. 58, no. 8, pp. 333-356, 1999.
[21] W.E. Cohen, D.W. Hyde, and R.K. Gaede, "An Optical Bus-Based Distributed Dynamic Barrier Mechanism," IEEE Trans. Computers, vol. 49, no. 12, pp. 1354-1365, Dec. 2000.
[22] S. Moh, C. Yu, B. Lee, H.Y. Youn, D. Han, and D. Lee, "Four-ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement," IEEE Trans. Computers, vol. 50, no. 8, pp. 811-823, Aug. 2001.
[23] T.A. Johnson and R.R. Hoare, "Cyclical Cascade Chains: A Dynamic Barrier Synchronization Mechanism for Multiprocessor Systems," Proc. 15th Int'l Parallel and Distributed Processing Symp., pp. 2061-2068, 2001.
[24] Y. Sun, P.Y.S. Cheung, and X. Lin, "Barrier Synchronization on Wormhole-Routed Networks," IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 6, pp. 583-597, Dec. 2001.
[25] I. Jung, J. Hyun, J. Lee, and J. Ma, "Two-Phase Barrier: A Synchronization Primitive for Improving the Processor Utilization," Int'l J. Parallel Programming, vol. 29, no. 6, pp. 607-627, Dec. 2001.
[26] D. Tsafrir and D.G. Feitelson, "Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms," Proc. Int'l Parallel and Distributed Processing Symp., pp. 80-87, 2002.
[27] S.P. Kini, J. Liu, J. Wu, P. Wyckoff, and D.K. Panda, "Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for Infiniband-Based Clusters," Proc. 10th European Parallel Virtual Machine (PVM)/Message Passing Interface (MPI) Users' Group Conf., pp. 369-378, 2003.
[28] M. Forsell, "Efficient Barrier Synchronization Mechanism for Emulated Shared Memory NOCs," Proc. Int'l Symp. System-on-Chip, pp. 33-36, 2004.
[29] J. Li, J.F. Martinez, and M.C. Huang, "The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors," Proc. 10th Int'l Symp. High Performance Computer Architecture, pp. 14-23, 2004.
[30] T. Hindam, "Connecting the Distributed Hardware Agents for Barrier Synchronization Operation," Proc. Int'l Conf. Electrical, Electronic and Computer Eng., pp. 261-264, 2004.
[31] N.-T. Tzeng, B. Kasula, and W. Hongyi, "Efficient Barrier Synchronization on Wireless Computing Systems," Proc. 11th Int'l Conf. Parallel and Distributed Systems, pp. 782-788, 2005.
[32] Z. Fang, L. Zhang, J.B. Carter, L. Cheng, and M. Parker, "Fast Synchronization on Shared-Memory Multiprocessors: An Architectural Approach," J. Parallel and Distributed Computing, vol. 65, no. 10, pp. 1158-1170, Oct. 2005.
[33] J. Sampson, R. González, J.-F. Collard, N.P. Jouppi, M. Schlansker, and B. Calder, "Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 235-246, 2006.
[34] S. Bai, Q. Zhou, R. Zhou, and L. Li, "Barrier Synchronization for CELL Multi-Processor Architecture," Proc. First IEEE Int'l Conf. Ubi-Media Computing, pp. 155-158, 2008.
[35] B. Beck, B. Kasten, and S. Thakkar, "VLSI Assist for a Multiprocessor," Proc. Second Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 10-20, 1987.
[36] Connection Machine CM-5 Technical Summary, Thinking Machines Corp., Nov. 1992.
[37] Cray T3D System Architecture Overview, Cray Research, Inc., 1993.
[38] T. Shimizu, "Performance Evaluation of the AP1000," FUJITSU Scientific and Technical J., vol. 29, pp. 15-24. Mar. 1993.
[39] Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, July 1997.
[40] TILE64 Processor, Tilera Corp., http:/www.tilera.com, 2007.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool