This Article 
 Bibliographic References 
 Add to: 
Architecture Scalability of Parallel Vector Computers with a Shared Memory
May 1998 (vol. 47 no. 5)
pp. 614-624

Abstract—Based on a model of a parallel vector computer with a shared memory, its scalability properties are derived. The processor-memory interconnection network is assumed to be composed of crossbar switches of size b×b. This paper analyzes sustainable peak performance under optimal conditions, i.e., no memory bank conflicts, sufficient processor-memory bank pathways, and no interconnection network conflicts. It will be shown that, with fully vectorizable algorithms and no communication overhead, the sustainable peak performance does not scale up linearly with the number of processors p. If the interconnection network is unbuffered, the number of memory banks must increase at least with O(p logbp) to sustain peak performance. If the network is buffered, this bottleneck can be alleviated; however, the half performance vector length still increases with O(logbp). The paper confirms the validity of the model by examining the performance behavior of the LINPACK benchmark.

[1] C.N. Arnold, "Methods for Performance Evaluation of Algorithms and Computers," Computers in Physics, vol. 4, no. 5, pp. 514-520, Sept./Oct. 1990.
[2] G. Bilardi and F.P. Preparata, "Horizons of Parallel Computation," J. Parallel and Distributed Computing, vol. 27, pp. 172-182, 1996.
[3] G. Bell, “Ultracomputers: A Teraflop Before Its Time,” Comm. ACM, vol. 35, no. 8, pp. 26-47, Aug. 1992.
[4] V.E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic.New York: Academic Press, 1965.
[5] C. Clos, "A Sudy of Non-Blocking Switching Networks," Bell System Technical J., vol. 32, pp. 406-424, Mar. 1953.
[6] D.M. Dias and J.R. Jump, "Analysis and Simulation of Buffered Delta Networks," IEEE Trans. Computers, vol. 30, no. 4, pp. 273-282, Apr. 1981.
[7] J. Ding and L. Bhuyan,“Finite buffer analysis of multistage interconnection networks,” IEEE Trans. Computers, vol. 43, no. 2, pp. 243-246, Feb. 1994.
[8] J.J. Dongarra, “Performance of Various Computers Using Standard Linear Equations Software,” Technical Report CS-89-85, Computer Science Dept., Univ. of Tennessee, K noxville, 1989.
[9] J.J. Dongarra, "The LINPACK Benchmark: An Explanation," Lecture Notes in Computer Science, vol. 297, pp. 456-474.Berlin: Springer, 1988.
[10] T.-Y. Feng, "A Survey of Interconnection Networks," Computer, vol. 14, no. 12, pp. 12-27, Dec. 1981.
[11] G.H. Golub and C.F. Van Loan, Matrix Computations, second ed., chapter 3. Baltimore: The Johns Hopkins Univ. Press, 1989.
[12] J.J. Hack, "Peak vs. Sustained Performance in Highly Concurrent Vector Machines," Computer, vol. 19, no. 9, pp. 11-19, Sept. 1986.
[13] M.D. Hill, "What Is Scalability?" Computer Architecture News, vol. 18, no. 4, pp. 18-21, Dec. 1990.
[14] R.W. Hockney, "Super-Computer Architecture," Infotech State of the Art Report: Future Systems 2, pp. 277-305.Maidenhead: Infotech, 1977.
[15] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.
[16] V. Kumar and A. Gupta,“Analyzing scalability of parallel algorithms and architectures,”Dep. Comput. Sci., Univ. Minnesota, Minneapolis, MN, Tech. Rep. TR 91-18, 1991; to appear inJ. Parallel and Distrib. Comput., 1994. A shorter version appears inProc. 1991 Int. Conf. Supercomput., 1991, pp. 396–405.
[17] D.H. Lawrie, "Access and Alignment of Data in an Array Processor," IEEE Trans. Computers, vol. 24, no. 12, pp. 1,145-1,155, Dec. 1975.
[18] G.F. Lev, N. Pippenger, and L.G. Valiant, "A Fast Parallel Algorithm for Routing in Permutation Networks," IEEE Trans. Computers, vol. 30, no. 2, pp. 93-100, Feb. 1981.
[19] Y. Mun and H.Y. Youn,“Performance analysis of finite buffered multistage interconnection networks,” IEEE Trans. Computers, vol. 43, no. 2, pp. 153-162, Feb. 1994.
[20] D. Nassimi and S. Sahni, "A Self-Routing Benes Network and Parallel Permutation Algorithms," IEEE Trans. Computers, vol. 30, no. 5, pp. 332-340, May 1981.
[21] D. Nassimi and S. Sahni, "Parallel Algorithms to Set-Up the Benes Permutation Network," Proc. Workshop Interconnection Networks for Parallel and Distributed Processing, pp. 70-71, 1980.
[22] D. Nussbaum and A. Agarwal,“Scalability of parallel machines,”Commun. ACM, vol. 34, pp. 57–61, 1991.
[23] J.H. Patel, "Performance of Processor-Memory Interconnections for Multiprocessors," IEEE Trans. Computers, vol. 30, no. 10, pp. 771-780, Oct. 1981.
[24] C.S. Raghavendra and R.V. Boppana,"On self-routing in Benes and shuffle-exchange networks," IEEE Trans. Computers, vol. 40, no. 9, pp.1057-1064, Sept. 1991.
[25] H.S. Stone, "Parallel Processing with the Perfect Shuffle," IEEE Trans. Computers, vol. 20, no. 2, pp. 153-161, Feb. 1971.
[26] Y. Tamir and H.-C. Chi, "Symmetric Crossbar Arbiters for VLSI Communication Switches," IEEE Trans. Parallel and Distributed Systems, Vol. 4, No. 1, 1993, pp. 13-27.
[27] C.-L. Wu and T.-Y. Feng, "On a Class of Multistage Interconnection Networks," IEEE Trans. Computers, vol. 29, no. 8, pp. 694-702, Aug. 1980.
[28] Y.-M. Yeh, T.-Y. Feng, "On a Class of Rearrangeable Networks," IEEE Trans. Computers, vol. 41, no. 11, pp. 1,361-1,379, Nov. 1992.
[29] H.Y. Youn and Y. Mun, "On Multistage Interconnection Networks with Small Clock Cycles," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 1, pp. 86-93, Jan. 1995.

Index Terms:
Architecture scalability, parallel vector computers, shared memory, sustainable peak performance, theoretical peak performance.
Eskil Dekker, "Architecture Scalability of Parallel Vector Computers with a Shared Memory," IEEE Transactions on Computers, vol. 47, no. 5, pp. 614-624, May 1998, doi:10.1109/12.677257
Usage of this product signifies your acceptance of the Terms of Use.