This Article 
 Bibliographic References 
 Add to: 
Limits on Interconnection Network Performance
October 1991 (vol. 2 no. 4)
pp. 398-412

The latency of direct networks is modeled, taking into account both switch and wiredelays. A simple closed-form expression for contention in buffered, direct networks is derived and found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints and under different workload parameters reveals that performance is highly sensitive to these constraints and workloads. A two-dimensional network is shown to have the lowest latency only when switch delays and network contention are ignored; three- or four-dimensional networks are favored otherwise. If communication locality exists, two-dimensional networks regain their advantage. Communication locality decreases both the base network latency and the network bandwidth requirements of applications. It is shown that a much larger fraction of the resulting performance improvement arises from the reduction in bandwidth requirements than from the decrease in latency.

[1] S. Abraham and K. Padmanabhan, "Performance of the direct binaryn- cube network for multiprocessors,"IEEE Trans. Comput., vol. 38, pp. 1000-1011, July 1989.
[2] A. Agrawal et al., "APRIL: A Processor Architecture for Multiprocessing,"17th Int'l Symp. Computer Architectures, 1990, IEEE Computer Soc. Press, Los Alamitos, Calif., Order No. 2047, pp. 104-114.
[3] W. C. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers,"IEEE Comput. Mag., vol. 21, pp. 9-24, Aug. 1988.
[4] S. Borkar et al., "iWarp: An Integrated Solution to High Speed Parallel Computing,"Proc. Supercomputing 88, Vol. 1, CS Press, Los Alamitos, Calif., Order No. 882, pp. 330-339.
[5] D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal, "Directory-based cache-coherence in large-scale multiprocessors,"IEEE Comput. Mag., vol. 23, pp. 41-58, June 1990.
[6] A. Agarwal et al., "Limitless Directories: A Scalable Cache Coherence Scheme,"Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, New York, 1991, pp. 224-234.
[7] W. J. Dally,A VLSI Architecture for Concurrent Data Structures. Boston, MA: Kluwer Academic, 1987, pp. 144-161.
[8] W. J. Dally, "Performance analysis ofk-aryn-cube interconnection networks,"IEEE Trans. Comput., vol. 39, pp. 775-785, June 1990.
[9] W. J. Dallyet al., "The J-Machine: A fine-grain concurrent computer," inProc. IFIP Congress, 1989.
[10] D. Gajski, D. Kuck, D. Lawrie, and A. Saleh, "Cedar--A large scale multiprocessor," inProc. Int. Conf. Parallel Processing, Aug. 1983, pp. 524-529.
[11] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD shared-memory parallel machine,"IEEE Trans. Comput., vol. C-32, pp. 175-189, Feb. 1983.
[12] R. Halstead and S. Ward, "The MuNet: A scalable decentralized architecture for parallel computation," inProc. 7th Annu. Symp. Comput. Architecture, May 1980, pp. 139-145.
[13] W. D. Hillis,The Connection Machine. Cambridge, MA: MIT Press, 1985.
[14] P. Kermani and L. Kleinrock, "Virtual cut-through: A new computer communication switching technique,"Comput. Networks, vol. 3, pp. 267-286, Oct. 1979.
[15] L. Kleinrock,Queueing Systems. New York: Wiley, 1975.
[16] C. P. Kruskal and M. Snir, "The performance of multistage interconnection networks for multiprocessors,"IEEE Trans. Comput., vol. C-32, pp. 1091-1098, Dec. 1983.
[17] C. P. Kruskal, M. Snir, and A. Weiss, "The distribution of waiting times in clocked multistage interconnection networks,"IEEE Trans. Comput., vol. 37, pp. 1337-1352, Nov. 1988.
[18] J. T. Kuehn and B. J. Smith, "The Horizon supercomputing system: Architecture and software," inSuper Computing'88, IEEE, Nov. 1988, pp. 28-34.
[19] D. H. Lawrie, "Access and alignment of data in an array processor,"IEEE Trans. Comput., vol. C-24, pp. 1145-1155, Dec. 1975.
[20] D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, "Design of the Stanford DASH multiprocessor," Comput. Syst. Lab. TR 89-403, Stanford Univ., Dec. 1989.
[21] A. Norton and G. F. Pfister, "A methodology for predicting multiprocessor performance," inProc. ICPP, Aug. 1985, pp. 772-781.
[22] J. H. Patel, "Performance of processor-memory interconnections for multiprocessors,"IEEE Trans. Comput., vol. C-30, pp. 771-780, Oct. 1981.
[23] G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," inProc. ICPP, Aug. 1985, pp. 764-771.
[24] C. L. Seitz, "Concurrent VLSI architectures,"IEEE Trans. Comput., vol. C-33, pp. 1247-1265, Dec. 1984.
[25] C. L. Seitz, "The Cosmic Cube,"Commun. ACM, pp. 22-33, Jan. 1985.
[26] C. L. Seitzet al., "The architecture and programming of the Ametek Series 2010 Multicomputer," inProc. Third Conf. Hypercube Concurrent Comput. Appl., ACM, Jan. 1988, pp. 33-37.
[27] H. J. Siegel,Interconnectron Networks for Large-Scale Parallel Processing: Theory and Case Studies, second ed. New York: McGraw-Hill, 1990.
[28] H. Sulivan and T. R. Bashkov, "A large scale homogeneous, fully distributed parallel machine, I," inProc. 4th Symp. Comput. Arch., March 1977, pp. 105-117.
[29] C. D. Thompson, "A complexity theory for VLSI," Ph.D. dissertation, Dep. Comput. Sci., Carnegie Mellon Univ., 1980.

Index Terms:
Index Termsbuffered networks; interconnection network performance; latency; direct networks; wiredelays; closed-form expression; packet size; communication locality; two-dimensionalnetwork; switch delays; network contention; four-dimensional networks; networkbandwidth requirements; multiprocessor interconnection networks; performanceevaluation
A. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, Oct. 1991, doi:10.1109/71.97897
Usage of this product signifies your acceptance of the Terms of Use.