This Article 
 Bibliographic References 
 Add to: 
Network-Based Multicomputers: A Practical Supercomputer Architecture
August 1996 (vol. 7 no. 8)
pp. 861-875

Abstract—Multicomputers built around a general network are an attractive architecture for a wide class of applications. The architecture provides many benefits compared with special-purpose approaches, including heterogeneity, reuse of application and system code, and sharing of resources. The architecture also poses new challenges to both computer system implementors and users. First, traditional local-area networks do not have enough bandwidth and create a communication bottleneck, thus seriously limiting the set of applications that can be run effectively. Second, programmers have to deal with large bodies of code distributed over a variety of architectures, and work in an environment where both the network and nodes are shared with other users. Our experience in the Nectar project shows that it is possible to overcome these problems. We show how networks based on high-speed crossbar switches and efficient protocol implementations can support high bandwidth and low latency communication while still enjoying the flexibility of general networks, and we use three applications to demonstrate that network-based multicomputers are a practical architecture. We also show how the network traffic generated by this new class of applications poses severe requirements for networks.

[1] D. Adams, "Cray T3D System Architecture Overview," Cray Research Inc., rev. 1.C, 1993.
[2] M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, O. Menzilcioglu, and J. Webb, "The Warp Computer: Architecture, Implementation, and Performance," IEEE Trans. Computers, vol. 36, no. 12, pp. 1,523-1,538, Dec. 1987.
[3] J. Arabe, A. Beguelin, B. Lowekamp, E. Seligman, M. Starkey, and P. Stephan, "Dome: Parallel Programming in a Heterogeneous Multi-user Environment," Tech. Report CMU-CS-95-137, Computer Science Dept., Carnegie Mellon Univ., Apr. 1995.
[4] E. Arnould, F. Bitz, E. Cooper, H.T. Kung, R. Sansom, and P. Steenkiste, "The Design of Nectar: A Network Backplane for Heterogeneous Multicomputers," Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 205-216. ACM/IEEE, Boston, Apr. 1989.
[5] D.H. Bailey, E. Barszcz, R.A. Fatoohi, H.D. Simon, and S. Weeratunga, "Performance Results on the Intel Touchstone Gamma Prototype," Proc. Fifth Distributed Memory Computing Conf., pp. 1,236-1,245. IEEE, Apr. 1990.
[6] H. Bal, M. Kaashoek, and A. Tanenbaum, "Orca: A language for parallel programming of distributed systems," IEEE Trans. Software Engineering, vol. 13, Mar. 1992.
[7] T. Blank, "The MasPar MP-1 Architecture," IEEE Compcon, pp. 20-24. IEEE, San Francisco, Feb./Mar. 1990.
[8] S. Bokhari, "Communication Overhead on the Intel iPSC-860 Hypercube," ICASE Interim Report 10, Inst. for Computer Applications in Science and Engineering, NASA Langley Research Center, May 1990.
[9] S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H.T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P.S. Tseng, J. Sutton, J. Urbanski, and J. Webb iWarp: An Integrated Solution to High-Speed Parallel Computing, Proc. 1988 Int'l Conf. Supercomputing, pp. 330-339., IEEE CS and ACM SIGARCH, Orlando, Fla., Nov. 1988.
[10] S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb, "Supporting Systolic and Memory Communication in iWarp," Proc. 17th Int'l Symp. Computer Architecture, pp. 70-81, 1990.
[11] B. Bruegge, H. Nishikawa, and P. Steenkiste, "Computing over Networks: An Illustrated Example," Proc. Sixth Distributed Memory Computing Conf., pp. 254-257. IEEE, Apr. 1991.
[12] B. Bruegge, "A Portable Platform for Distributed Event Environments," ed., Proc. ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 184-193. ACM, Santa Cruz, Calif., Dec. 1991.
[13] R.E. Bryant, D. Beatty, K. Brace, K. Cho, and T. Sheffler, "COSMOS: A Compiled Simulator for MOS Circuits," Proc. Design Automation Conf., pp. 9-16. ACM/IEEE, June 1987.
[14] V.G. Cerf and R.E. Kahn, "A Protocol for Packet Network Communication," IEEE Trans. on Comm, vol. 22, no. 5, pp. 637-648, May, 1974.
[15] M. Chen, "A Parallel Language and Its Compilation to Multiprocessors Machines or VLSI," Conf. Record 13th Ann. ACM Conf. Principles of Programming Languages, pp. 131-139. ACM, Jan. 1986.
[16] R.S. Chin and S.T. Chanson, "Distributed Object-Based Programming Systems," ACM Computing Surveys, vol. 23, no. 1, pp. 91-124, Mar. 1991.
[17] Y. Choi, "Vertex-Based Boundary Representation of Non-Manifold Geometric Models," PhD thesis, Carnegie Mellon Univ., 1989.
[18] D. Clark et al., "An Analysis of TCP Processing Overhead," IEEE Comm., vol. 27, no. 6, June 1989, pp. 23-29.
[19] R. Clay and P. Steenkiste, "Distributing a Chemical Process Optimization Application over a Gigabit Network," Proc. Supercomputing '95, ACM/IEEE, Dec. 1995. appeared on CD-Rom only.
[20] E. Cooper, P. Steenkiste, R. Sansom, and B. Zill, "Protocol Implementation on the Nectar Communication Processor," Proc. SIGCOMM '90 Symp. Comm. Architectures and Protocols, pp. 135-143. ACM, Philadelphia, Sept. 1990.
[21] M. de Prycker, Asynchronous Transfer Mode. E. Harwood, 1991.
[22] "TURBOchannel Overview," Digital Equipment Corporation, 1990.
[23] T. Gross, D. O'Halloran, and J. Subhlok, "Task Parallelism in a High Performance Fortran Framework," IEEE Parallel and Distributed Technology, vol. 2, no. 3, pp. 16-26, Fall 1994.
[24] R.D. Gaglianello, B.S. Robinson, T.L. Lindstrom, and E.E. Sampieri, "HPC/VORX: A Local Area Multicomputer System," Proc. Ninth Int'l Conf. Distributed Computing Systems, pp. 246-253. IEEE, June 1989.
[25] G.A. Geist and V.S. Sunderam, "The PVM System: Supercomputer Level Concurrent Computation on a Heterogeneous Network of Workstations," Proc. Sixth Distributed Memory Computing Conf., pp. 258-261. IEEE, Apr. 1991.
[26] A. Grimshaw, "The Mentat Run-Time System: Support for Medium Grain Parallel Computation," Proc. Fifth Distributed Memory Computing Conf., pp. 1,064-1,073. IEEE, Apr. 1990.
[27] J.L. Gustafson, G.R. Montry, and R.E. Benner, "Development of Parallel Methods for a 1024-processor Hypercube," SIAM J. Scientific and Statistical Computing, vol. 9, no. 4, pp. 609-638, July 1988.
[28] L. Hamey, J. Webb, and I.-C. Wu, "Low-Level Vision on Warp and the Apply Programming Model," Parallel Computation and Computers for Artificial Intelligence, J. Kowalik, ed., pp. 185-199. Kluwer Academic Publishers, 1987.
[29] M. Homewood, D. May, D. Shepherd, and R. Shepherd, "The IMS T800 Transputer," IEEE Micro, vol. 7, no. 5, pp. 10-26, Oct. 1987.
[30] "iPSC/2 C Programmer's Reference Manual," Intel, 1988.
[31] "Paragon X/PS Product Overview," Intel Corporation, 1991.
[32] T. Stricker, J. Stichnoth, D. O'Hallaron, S. Hinrichs, and T. Gross, "Decoupling Communication Services for Compiled Parallel Programs," Tech. Report CMU-CS-94-139, Carnegie Mellon Univ., School of Computer Science, 1994.
[33] R. Iyer and G. McRae, "Parallel Strategies for Flowsheet Simulation Using Heterogeneous Distributed-Memory Computers," 1992. submitted for publication.
[34] P. Janson and R. Molva, "Security in Open Networks and Distributed Systems," Computer Networks and ISDN System, vol. 22, no. 5, pp. 323-346, Oct., 1991.
[35] M. Khaira, "Enabling Large Scale Switch-Level Simulation," internal report, 1991.
[36] K. Kleinpaste, P. Steenkiste, and B. Zill, "Software Support for Outboard Buffering and Checksumming," Proc. SIGCOMM '95 Symp. Comm. Architectures and Protocols, pp. 87-98. ACM, Aug./Sept. 1995.
[37] C. Kosak, D. Eckhardt, T. Mummert, P. Steenkiste, and A. Fisher, "Buffer Management and Flow Control in the Credit Net ATM Host Interface," Proc. 20th Conf. Local Computer Networks, pp. 370-378. IEEE, Minneapolis, Oct. 1995.
[38] G.K. Kudva and J. Pekny, "A Distributed Exact Algorithm for the Multiple Resource Constrained Sequencing Problem," Annals of Operations Research, 1992.
[39] G. Kudva and J.F. Pekny, "DCABB: A Distributed Control Architecture for Branch and Bound Calculations," Computers and Chemical Engineering, 1994. in press.
[40] H.T. Kung, "Heterogeneous Multicomputers," Carnegie Mellon Computer Science: A 25-Year Commemorative, R.F. Rashid ed., pp. 235-251.Reading, Mass.: Addison-Wesley, 1990.
[41] H.T. Kung, R. Sansom, S. Schlick, P. Steenkiste, M. Arnould, F.J. Bitz, F. Christianson, E.C. Cooper, O. Menzilcioglu, D. Ombres, and B. Zill, "Network-Based Multicomputers: An Emerging Parallel Architecture," Proc. Supercomputing '91, pp. 664-673. IEEE, Albequerque, Nov. 1991.
[42] H.T. Kung, P. Steenkiste, M. Gubitoso, and M. Khaira, "Parallelizing a New Class of Large Applications over High-Speed Networks," Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 167-177. ACM, Apr. 1991.
[43] H.T. Kung, "Gigabit Local Area Networks: A Systems Perspective," IEEE Comm. Magazine, vol. 30, no. 4, Apr. 1992.
[44] S.J. Leffler, M.K. McKusick, M.J. Karels, and J.S. Quarterman, The Design and Implementation of the 4.3BSD UNIX Operating System.Reading, Mass.: Addison-Wesley, 1989.
[45] D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, "The DASH Prototype: Implementation and Performance," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 92-102. IEEE, May 1992.
[46] The MPI Forum, "MPI: A Message Passing Interface," Proc. Supercomputing '93, pp. 878-883. ACM/IEEE, Oregon, Nov. 1993.
[47] B. Bruegge and P. Steenkiste, "Supporting the Development of Network Programs," Proc. 11th Int'l Conf. Distributed Computing Systems, pp. 641-648. IEEE, May 1991.
[48] H. Nishikawa and P. Steenkiste, "Aroma: Language Support for Distributed Objects," Proc. Int'l Parallel Processing Symp., pp. 686-690. IEEE, Los Angeles, Apr. 1992.
[49] H. Nishikawa and P. Steenkiste, "A General Architecture for Load Balancing in a Distributed-Memory Environment," Proc. 13th Int'l Conf. Distributed Computing Systems, pp. 47-54. IEEE, Pittsburgh, May 1993.
[50] A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, D. Lee, and M. Parkin, "The Scalable Shared Memory Multiprocessor," Proc. 27th Hawaii Int'l Conf. System Sciences—Vol. I: Architecture, pp. 144-153. IEEE, Jan. 1994.
[51] "Sbus Specification A.1," Sun Microsystems, Inc., 1990.
[52] M.D. Schroeder, A.D. Birrell, M. Burrows, H. Murray, R.M. Needham, T.L. Rodeheffer, E.H. Satterthwaite, and C.P. Thacker, "Autonet: A High-speed, Self-configuring Local Area Network Using Point-to-point Links," Research Report 59, DEC Systems Research Center, Apr. 1990.
[53] B. Siegell and P. Steenkiste, "Automatic Generation of Parallel Programs with Dynamic Load Balancing," Proc. Third Int'l Symp. High-Performance Distributed Computing, pp. 166-175. IEEE, San Francisco, Aug. 1994.
[54] B. Siegell and P. Steenkiste, "Controlling Application Grain Size on a Network of Workstations," Proc. Supercomputing '95. ACM/IEEE, Dec. 1995. appeared on CD-Rom only.
[55] B. Siegell, "Automatic Generation of Parallel Programs with Dynamic Load Balancing for a Network of Workstations," PhD thesis, Dept. of Computer and Electrical Engineering, Carnegie Mellon Univ., 1995. also appeared as Tech. Report CMU-CS-95-168.
[56] P. Steenkiste, "A Symmetrical Communication Interface for Distributed-Memory Computers," Proc. Sixth Distributed Memory Computing Conf., pp. 262-265. IEEE, Portland, Apr. 1991.
[57] P.A. Steenkiste, B.D. Zill, H.T. Kung, S.J. Schlick, J. Hughes, B. Kowalski, and J. Mullaney, "A Host Interface Architecture for High-Speed Networks," Proc. Fourth IFIP Conf. High Performance Networks, pp. A3 1-16. IFIP, Liege, Belgium: Elsevier, Dec. 1992.
[58] P. Steenkiste, "Analyzing Communication Latency Using the Nectar Communication Processor," Proc. SIGCOMM '92 Symp. Comm. Architectures and Protocols, pp. 199-209. ACM, Baltimore, Aug. 1992.
[59] P.A. Steenkiste, M. Hemy, T. Mummert, B. Zill, "Architecture and Evaluation of a High-Speed Networking Subsystem for Distributed-Memory Systems," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 154-163. IEEE, May 1994.
[60] P.A. Steenkiste, "A Systematic Approach to Host Interface Design for High-Speed Networks," Computer, vol. 26, no. 3, pp. 47-57, Mar. 1994.
[61] C. Tseng and S. Hiranandani, and K. Kennedy, "Preliminary Experiences with the Fortran D Compiler," Proc. Supercomputing '93, pp. 338—350.Portland, Ore., Nov. 1993.
[62] L.W. Tucker and G.G. Robertson, Architecture and Applications of the Connection Machine Computer, vol. 21, no. 8, pp. 26-38, Aug. 1988.
[63] T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Int’l Symp. Computer Architecture, Assoc. of Computing Machinery, N.Y., May 1992, pp. 256-266.
[64] J.A. Webb, "Architecture-Independent Global Image Processing," Proc. 10th Int'l Conf. Pattern Recognition, pp. 623-628. IEEE, Atlantic City, N.J., June 1990.
[65] J. Webb, "Steps Towards Architecture-Independent Image Processing," Computer, vol. 25, no. 2, pp. 21-31, Feb. 1992.
[66] I.-C. Wu and H.T. Kung, "Communication Complexity for Parallel Divide-and-Conquer," Proc. Symp. Foundations of Computer Science, pp. 151-162.San Juan, Oct. 1991.
[67] I.-C. Wu, "Multilist Scheduling: A New Parallel Programming Model," PhD thesis, School of Computer Science, Carnegie Mellon Univ., 1993. also appeared as Tech. Report CMU-CS-93-184.

Index Terms:
Multicomputer, workstation cluster, high-speed network, host-network interface, distributed programming, traffic characteristics.
Peter Steenkiste, "Network-Based Multicomputers: A Practical Supercomputer Architecture," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 861-875, Aug. 1996, doi:10.1109/71.532117
Usage of this product signifies your acceptance of the Terms of Use.