This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Performance of the Cedar Multistage Switching Network
April 1997 (vol. 8 no. 4)
pp. 321-336

Abstract—While multistage switching networks for vector multiprocessors have been studied extensively, detailed evaluations of their performance are rare. Indeed, analytical models, simulations with pseudosynthetic loads, studies focused on average-value parameters, and measurements of networks disconnected from the machine, all provide limited information. In this paper, instead, we present an in-depth empirical analysis of a multistage switching network in a realistic setting: We use hardware probes to examine the performance of the omega network of the Cedar shared-memory machine executing real applications. The machine is configured with 16 vector processors.

The analysis suggests that the performance of multistage switching networks is limited by traffic nonuniformities. We identify two major nonuniformities that degrade Cedar's performance and are likely to slow down other networks too. The first one is the contention caused by the return messages in a vector access as they converge from the memories to one processor port. This traffic convergence penalizes vector reads and, more importantly, causes tree saturation. The second nonuniformity is the uneven contention delays induced by a relatively fair scheme to resolve message collisions. Based on our observations, we argue that intuitive optimizations for multistage switching networks may not be the most cost-effective ones. Instead, we suggest changes to increase the network bandwidth at the root of the traffic convergence tree and to delay traffic convergence up until the final stages of the network.

[1] J.B. Andrews, "A Hardware Tracing Facility for a Multiprocessing Supercomputer," Technical Report 1009, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, May 1990.
[2] M. Berry et al., "The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers," Int'l J. Supercomputer Applications, vol. 3, no. 3, pp. 5-40, Fall 1989.
[3] R. Eigenmann, J. Hoeflinger, G. Jaxon, and D. Padua, "The Cedar Fortran Project," Technical Report 1262, Center for Supercomputing Research and Development, Oct. 1992.
[4] A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer," IEEE Trans. Computers, vol. 32, no. 2, pp. 175-189, Feb. 1983.
[5] E.D. Granston, S.W. Turner, and A.V. Veidenbaum, "Design and Analysis of a Scalable Shared-Memory System with Support for Burst Traffic," Proc. 16th ISCA Workshop Cache and Interconnect Architectures in Multiprocessors,Boston, 1989.
[6] H. Jiang, L.N. Bhuyan, and J.K. Muppala, "MVAMIN: Mean Value Analysis Algorithms for Multistage Interconnection Networks," J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 189-201, July 1991.
[7] C. Kruskal and M. Snir, "The Performance of Multistage Interconnection Networks for Multiprocessors," IEEE Trans. Computers, vol. 32, no. 12, pp. 1,091-1,098, Dec. 1983.
[8] D. Kuck, E. Davidson, D. Lawrie, A. Sameh, C.-Q. Zhu, A. Veidenbaum, J. Konicek, P. Yew, K. Gallivan, W. Jalby, H. Wijshoff, R. Bramley, U.M. Yang, P. Emrath, D. Padua, R. Eigenmann, J. Hoeflinger, G. Jaxon, Z. Li, T. Murphy, J. Andrews, and S. Turner, "The Cedar System and an Initial Performance Study," Proc. 20th Int'l Symp. Computer Architecture,San Diego, Calif., May 1993.
[9] D.H. Lawrie, "Access and Alignment of Data in an Array Processor," IEEE Trans. Computers, vol. 24, no. 12, pp. 1,145-1,155, Dec. 1975.
[10] Y. Lee, S. Cheung, and J. Peir, "Consecutive Requests Traffic Model in Multistage Interconnection Networks," Proc. 1991 Int'l Conf. Parallel Processing, pp. 534-541, Aug. 1991.
[11] T. Liu and L Kleinrock,“Performance analysis of finite-buffered multistage interconnection networks with a general traffic pattern,”in1991 ACM SIGMETRICS Conf.,San Diego, CA, May 1991, pp. 68–78.
[12] K. McAuliffe, "Analysis of Cache Memories in Highly Parallel Systems," Technical Report 269, Dept. of Computer Science, New York Univ., May 1986.
[13] P. Mohapatra and C. Das, "A Queuing Model for Finite-Buffered Multistage Interconnection Networks," Proc. 1993 Int'l Conf. Parallel Processing, pp. I210-I213, Aug. 1993.
[14] M.D. Noakes, D.A. Wallach, and W.J. Dally, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 224-235, May 1993.
[15] J.H. Patel, "Performance of Processor-Memory Interconnections for Multiprocessors," IEEE Trans. Computers, vol. 30, no. 10, pp. 771-780, Oct. 1981.
[16] G. Pfister and A. Norton, "'Hot Spot' Contention and Combining in Multistage Interconnection Networks," Proc. 1985 Int'l Conf. Parallel Processing, pp. 790-797, Aug. 1985.
[17] Panel Session, "Benchmarking Interconnects," Proc. Hot Interconnects I Symp., Aug. 1993.
[18] J.E. Smith and W.R. Taylor,“Accurate modeling of interconnection networks in vector supercomputers,” 1991 Int’l Conf. Supercomputing, pp. 264-273, 1991.
[19] J.E. Smith and W.R. Taylor, "Characterizing Memory Performance in Vector Multiprocessors," Proc. Int'l Conf. Supercomputing, pp. 35-44, July 1992.
[20] S. Turner, "Shared Memory and Interconnection Network Performance for Vector Multiprocessors," Master's thesis, Technical Report 876, Center for Supercomputing Research and Development, May 1989.
[21] S. Turner and A. Veidenbaum, "Performance of a Shared-Memory System for Vector Multiprocessors," Proc. 1988 Int'l Conf. Supercomputing, pp. 315-325, July 1988.
[22] D. Willick and D. Eager, “An Analytical Model of Multistage Interconnection Networks,” Proc. ACM SIGMETRICS, pp. 192-202, May 1990.
[23] H. Yoon,K.Y. Lee,, and M.T. Liu,“Performance analysis of multibuffered packet-switching networks in multiprocessor systems,” IEEE Trans. Computers, vol. 39, no. 3, pp. 319-327, Mar. 1990.

Index Terms:
Multistage switching networks, vector multiprocessors, performance evaluation, experimental analysis, address tracing.
Citation:
Josep Torrellas, Zheng Zhang, "The Performance of the Cedar Multistage Switching Network," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 4, pp. 321-336, April 1997, doi:10.1109/71.588598
Usage of this product signifies your acceptance of the Terms of Use.