This Article 
 Bibliographic References 
 Add to: 
Integrated Network Barriers
April 2002 (vol. 13 no. 4)
pp. 337-348

Integrated Network Barriers (INBs) are a network protocol for parallel processors. INBs are both pipelinable and have low latency. In this paper, we show that INBs implement barriers—which ensure that all prebarrier operations of any processor appear to complete before any post-barrier operations—and we show how to construct efficient, deadlock-free barriers for any interconnection network and routing function which has an acyclic queue dependency graph. As a special case, INBs can be implemented for any network and routing function for which there exists an acyclic channel dependency graph.

[1] R. Alverson et al., "The Tera Computer System," Proc. Int'l Conf. Supercomputing, Assoc. of Computing Machinery, N.Y., 1990, pp. 1-6.
[2] M. Ahuja, “Flush Primitives for Asynchronous Distributed Systems,” Information Processing Letters, vol. 34, no. 1, pp. 5-12, Feb. 1990.
[3] M. Ahuja, “An Implementation of F-Channels,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 6, pp. 658-667, June 1993.
[4] T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir, “SP2 System Architecture,” IBM Systems J., vol. 34, no. 2,pp. 153–184, 1995.
[5] R. Berrendorf, H.C. Burg, U. Detert, R. Esser, M. Gerndt, and R. Knecht, “Intel Paragon XP/S—Architecture, Software Environment, and Performance,” Technical Report KFA-ZAM-IB-9409, KFA Research Centre, Jülich, Germany, May 1994.
[6] Y. Birk, P.B. Gibbons, J.L.C. Sanz, and D. Soroker, “A Simple Mechanism for Efficient Barrier Synchronization in MIMD Machines,” Technical Report RJ7078 (67141), IBM Research Report, Oct. 1989.
[7] Y. Birk, P.B. Gibbons, J.L.C. Sanz, and D. Soroker, “A Simple Mechanism for Rfficient Barrier Synchronization in MIMD Machines,” Proc. 1990 Int'l Conf. Parallel Processing, vol. II, pp. 195-198, 1990.
[8] W.C. Brantley, K.P. McAuliffe, and J. Weiss, “RP3 Processor-Memory Element,” Proc. 1985 Int'l Conf. Parallel Processing, pp. 782-789, Aug. 1985.
[9] E.C. Coffman, M.J. Elphick, and J. Shoshani, “System Deadlocks,” ACM Computing Surveys, vol. 3, no. 2, pp. 67–68, June 1971.
[10] E.W. Dijkstra, “Cooperating Sequential Processes,” Programming Languages, F. Genuys, ed., pp. 43-112, 1968.
[11] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, Vol. C-36, No. 5, May 1987, pp. 547-553.
[12] M. Dubois,C. Scheurich,, and F. Briggs,“Memory access buffering in multiprocessors,” Proc. 13th Int’l Symp. Comp. Arch., pp. 434-442, June 1986.
[13] A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph, and M. Sni, “The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Processor,” IEEE Trans. Computers, vol. 32, no. 2, pp. 175-189, Feb. 1984.
[14] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[15] W.D. Hillis, The Connection Machine, MIT Press, Cambridge, Mass., 1985.
[16] Intel Corporation “A Touchstone DELTA System Description,” Technical report, Intel Corporation, 1991.
[17] H.F. Jordan, “A Special Purpose Architecture for Finite Element Analysis,” Proc. 1978 Int'l Conf. Parallel Processing, pp. 263-266, 1978.
[18] S. Konstantinidou and L. Snyder, "Chaos Router: Architecture and Performance," Proc. 18th Ann. Int'l Symp. Computer Architecture, 1991.
[19] C.E. Leiserson,Z.S. Abuhamdeh,D.C. Douglas,C.R. Feynman,M.N. Ganmuki,J.V. Hill,W.D. Hillis,B.C. Kuszmaul,M.A. St. Pierre,D.S. Wells,M.C. Wong,S.-W. Yang,, and R. Zak,“The network architecture of the connection machine CM-5,” Proc. Fourth Ann. Symp. Parallel Algorithms and Architectures, ACM, pp. 272-285, June 1992.
[20] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[21] W. Oed, “Massively Parallel Processor System CRAY T3D,” technical report, Cray Research, Nov. 1993.
[22] G.D. Pifarré, L. Gravano, S.A. Felperin, and J.L.C. Sanz, "Fully Adaptive Minimal Deadlock-Free Packet Routing in Hypercubes, Meshes and Other Networks: Algorithms and Simulations," IEEE Trans. Parallel Distributed Systems, vol. 5, no. 3, pp. 247-263, Mar. 1994.
[23] A. Ranade, “How to Emulate Shared Memory,” Proc. IEEE Ann. Symp. Foundations of Computer Science, pp. 185-194, 1987.
[24] A.G. Ranade, "How to Emulate Shared Memory," J. Computer and System Sciences, vol. 42, pp. 307-326, 1991.
[25] B.J. Smith, “Architecture and Applications of the HEP Multiprocessor Computer System,“ SPIE Real-Time Signal Processing IV, vol. 298, pp. 241-248, 1981.
[26] J.A. Solworth and J. Stamatopoulos, “Integrated Network Barriers for D-Dimensional Meshes,” Proc. IFIP Working Conf. Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pp. 179-190, Jan. 1993.
[27] J. Stamatopoulos and J.A. Solworth, “Increasing Network Bandwidth on Meshes,” Proc. Sixth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 336-345, June 1994.
[28] J. Stamatopoulos and J.A. Solworth, “Universal Congestion Control in Meshes,” Proc. Seventh Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 165-174, July 1995.

Index Terms:
barrier synchronization, high-performance computing, integrated network barriers, interconnection networks, parallel processing, routing
Jerry Stamatopoulos, Jon A. Solworth, "Integrated Network Barriers," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 4, pp. 337-348, April 2002, doi:10.1109/71.995814
Usage of this product signifies your acceptance of the Terms of Use.