This Article 
 Bibliographic References 
 Add to: 
Issues in the Design of High Performance SIMD Architectures
August 1996 (vol. 7 no. 8)
pp. 818-829

Abstract—In this paper, we consider the design of high performance SIMD architectures. We examine three mechanisms by which the performance of this class of machines may be improved, and which have been largely unexplored by the SIMD community. The mechanisms are pipelined instruction broadcast, pipelining of the PE architecture, and the introduction of a novel memory hierarchy in the PE address space which we denote the direct only data cache, (dod-cache). For each of the performance improvements, we develop analytical models of the potential speedup, and apply those models to real program traces obtained on a MasPar MP-2 system. In addition, we consider the impact of all improvements taken together.

[1] D. Alpert and D. Avnon, “Architecture of the Pentium Microprocessor,” IEEE Micro, Vol. 13, No. 3, June 1993, pp. 11-21.
[2] M. Alsup, "Motorola's 8800 Family Architecture," IEEE Micro, vol 10, pp. 48-66, June 1990.
[3] K.E. Batcher, "Design of a Massively Parallel Processor," IEEE Trans. Computers, vol. 29, no 9, pp. 836-840, Sept. 1980.
[4] T. Blank, "The MasPar MP-1 Architecture," IEEE Compcon, pp. 20-24. IEEE, San Francisco, Feb./Mar. 1990.
[5] D.W. Blewins et al., “Blitzen: A Highly Integrated, Massively Parallel Machine,” J. Parallel Distributed Computing, Vol. 8, No. 2, 1990, pp. 150-160.
[6] W.J. Bouknight et al., "The Illiac IV System," Proc. IEEE, vol. 60, no. 4, pp. 369-388, Apr. 1972.
[7] Connection Machine Model CM-2 Technical Summary, Version 51., Thinking Machines Corporation, May 1989
[8] K. Diefendorff and M. Allen, “Organization of the Motorola 88110 Superscalar RISC Microprocessor,” IEEE Micro, vol. 2, no. 2, pp. 40-63, Apr. 1992.
[9] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[10] M.C. Herbordt and C.C. Weems, "Experimental Analysis of Some SIMD Array Memory Hierarchies," Proc. 1995 Int'l Conf. Parallel Processing, vol. I, pp. I/210-214,Urbana, Ill., Aug.14-18, 1995.
[11] P.T. Highnam, "Systems and Programming Issues in the Design and Use of a SIMD Linear Array for Image Processing," CMU-CS-91-136, Computer Science Division, Carnegie Mellon Univ.
[12] W.D. Hillis, The Connection Machine, MIT Press, Cambridge, Mass., 1985.
[13] T.J. Holman and L. Snyder, “Architectural Tradeoffs in Parallel Computer Design,” Proc. 1989 Decennial Caltech Conf. Advanced Research in VLSI, pp. 317-334, Mar. 1989.
[14] K. Hwang and F.A. Briggs,Computer Architecture and Parallel Processing.New York: McGraw Hill, 1984.
[15] W. Kim and R. Tuck, "MasPar MP-2 PE Chip: A Totally Cool Hot Chip," Proc. IEEE 1993 Hot Chips Symp., Mar. 1993.
[16] M. Kumar, “Unique Design Concepts in GF11 and Their Impact on Performance,” IBM J. Research and Development, vol. 36, no. 6, Nov. 1992.
[17] MasPar Programming Language (ANSI C compatible MPL) User Guide, MasPar Computer Corporation, Software Version 3.2, Revision: A5, July 1993.
[18] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.
[19] J. Nickolls, "The Design of the MasPar MP-1," Proc. 35th IEEE CS Int'l Conf., COMP-CON '90, pp. 25-28, 1990.
[20] Paris Release Notes, The Thinking Machines Corporation, version 5.0, Feb. 1989.
[21] The Massively Parallel Processor, J.L. Potter ed. MIT Press, 1985.
[22] T.E. Rockoff, "An Analysis of Instruction-Cached SIMD Computer Architecture," CMU-CS-93-218, Computer Science Division, Carnegie Mellon Univ.
[23] S. Shen and L. Kleinrock, "The Virtual-Time Data-Parallel Machine," Proc. Fourth Symp. Frontiers of Massively Parallel Computation, pp. 46-53,McLean, Va., Oct.19-21, 1992, IEEE CS Press, 1992.
[24] R. Tuck MasPar Corporation, personal correspondence.

Index Terms:
SIMD, pipelining, caches, MasPar, data parallel.
James D. Allen, David E. Schimmel, "Issues in the Design of High Performance SIMD Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 818-829, Aug. 1996, doi:10.1109/71.532113
Usage of this product signifies your acceptance of the Terms of Use.