This Article 
 Bibliographic References 
 Add to: 
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
January 1994 (vol. 43 no. 1)
pp. 52-67

Investigates the performance of word-packet, slotted unidirectional ring-based hierarchical direct networks in the context of large-scale shared memory multiprocessors. Slotted unidirectional rings are attractive because their electrical characteristics and simple interfaces allow for fast cycle times and large bandwidths. For large-scale systems, it is necessary to use multiple rings for increased aggregate bandwidth. Hierarchies are attractive because the topology ensures unique paths between nodes, simple node interfaces and simple inter-ring connections. To ensure that a realistic region of the design space is examined, the architecture of the network used in the Hector prototype is adopted as the initial design point. A simulator of that architecture has been developed and validated with measurements from the prototype. The system and workload parameterization reflects conditions expected in the near future. The results of this study shows the importance of system balance on performance.

[1] Adve, S., et al., "Comparison of Hardware and Software Cache Coherence Schemes,"Proc. 18th ISCA, 1991, pp. 298-308.
[2] V. S. Adve and M. K. Vernon, "Performance analysis of multiprocessor mesh interconnection networks with wormhole routing," Tech. Rep. CS-TR-1001, Comput. Sci. Dept., Univ. Wisconsin-Madison, Madison, WI, Feb. 1991.
[3] A. Agarwal, "Limits on interconnection network performance,"IEEE Trans. Parallel Distributed Syst., vol. 2, pp. 398-412, Oct. 1991.
[4] A. Agarwal, "Performance tradeoffs in multithreaded processors,"IEEE Trans. Parallel Distributed Syst., vol. 3, pp. 525-539, Sept. 1992.
[5] L. Barroso and M. Dubois, "Cache coherence on a slotted ring," inProc. 1991 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1991, pp. (I)230-237.
[6] G. Bell "Ultracomputers: A Teraflop Before Its Time,"Comm. ACM, Vol.35, No.8, Aug. 1992, pp. 27-47.
[7] B. Boothe and A. Ranade, "Improved multithreaded techniques for hiding communication latency in multiprocessors," inProc. 18th Annual Int. Symp. Comput. Architecture, Gold Coast, Australia, May 1992, pp. 214-223.
[8] H. Burkhardt, S. Frank, B. Knobe, and J. Rothnie, "Overview of the KSR 1 computer system," Tech. Rep. KSR-TR-9202001, Kendall Square Res., Boston, MA, Feb. 1992.
[9] D.-K. Chen, H.-M. Su, and P.-C. Yew, "The impact of synchronization and granularity on parallel systems," inProc. 16th Annual Int. Symp. Comput. Architecture, Seattle, WA, May 1990, pp. 239-248.
[10] T.-F. Chen and J.-L. Baer, "Reducing memory latency via nonblocking and prefetching caches," inProc. 5th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Boston, MA, Oct. 1992, pp. 51-61.
[11] R. Comerford, "How DEC developed Alpha,"IEEE Spectrum, vol. 29, no. 7, pp. 26-31, July 1992.
[12] H. Davis, S. R. Goldschmidt, and J. Hennessy, "Multiprocessor simulation and tracing using Tango," inProc. 1991 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1991, pp. (II)99-107.
[13] K. Farkas, 2. Vranesic, and M. Stumm, "Cache consistency in hierarchical-ring-based multi-processors," inProc. Supercomputing 92, Nov. 1992.
[14] M. Ferrante, "CYBERPLUS and MAP V interprocessor communications for parallel and array processor systems," in W. J. Karplus, Ed.,Multiprocessors and Array Processors. The Society for Computer Simulations, 1987, pp. 45-54.
[15] K. Gharachorloo, A. Gupta, and J. Hennessy, "Hiding memory latency using dynamic scheduling in shared-memory multiprocessors," inProc. 18th Annu. Int. Symp. Computer Architecture, Gold Coast, Australia, May 1992, pp. 22-35.
[16] A. Gupta et al., "Comparative Evaluation of Latency Reducing and Tolerating Techniques,"Proc. 18th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 2146, 1991, pp. 254-263.
[17] D. B. Gustavson, "The scalable coherent interface and related standards projects,"IEEE Micro, vol. 12, no. 1, pp. 10-22, Feb. 1992.
[18] W. T. Hsu and P.-C. Yew, "An effective synchronization network for hot-spot accesses,"ACM Trans. Comput. Syst., vol. 10, pp. 167-189, Aug. 1992.
[19] M. H. MacDougall,Simulating Computer Systems, Techniques, and Tools. Cambridge, MA: MIT Press, 1987.
[20] J.M. Mellor-Crummey and M.L. Scott, "Synchronization Without Contention,"Proc. Fourth Int'l Conf. Architectural support for Programming Languages and Operation Systems, Assoc. of Computing Machinery, N.Y., pp. 269-278.
[21] R. Ng, "Fast computer memories,"IEEE Spectrum, vol. 29, no. 10, pp. 36-39, Oct. 1992.
[22] P1596 Ballot Review Committee of the IEEE Microprocessor Standards Committee, "Sci-scalable coherent interface," p1596/d2.00. Tech. Rep., IEEE, Nov. 1991.
[23] G. F. Pfister and V. A. Norton, "'Hot spot' contention and combining in multistage interconnection networks,"IEEE Trans. Comput., vol. 34, p. 943-948, Oct. 1985.
[24] B. Prince,Semiconductor Memories, second edition. New York: Wiley, 1991.
[25] D. A. Reed and R. M. Fujimoto,Multicomputer Networks, Message-Based Parallel Processing. Cambridge, MA: MIT Press, 1987.
[26] S. L. Scott, J. R. Goodman, and M. K. Vernon, "Performance of the SCI ring," inProc. 18th Annu. Int. Symp. Comput. Architecture, Gold Coast, Australia, May 1992, pp. 403-414.
[27] M. Stumm, Z. Vranesic, R. White, R. Unram, and K. Farkas, "Experiences with the Hector multiprocessor," Tech. Rep. CSRI Tech. Rep. 276, Univ. Toronto, Dept. Comput. Sci., Toronto, ON, Canada, 1992.
[28] M. K. Vernon, R. Jog, and G. S. Sohi, "Performance analysis of hierarchical cache-consistent multiprocessors,"Perform. Eval., vol. 9, pp. 287-302, 1989.
[29] Z. G. Vranesic, M. Stumm, D. M. Lewis, and R. White, "Hector: A hierarchically structured shared-memory multiprocessor,"IEEE Comput., pp. 72-78, Jan. 1991.
[30] P.-C. Yew, N.-F. Tzeng, and D.H. Lawrie, "Distributing hot-spot addressing in large-scale multiprocessors,"IEEE Trans. Comput., vol. C- 36, pp. 388-395, Apr. 1987.
[31] R. N. Zucker and J.-L. Baer, "A performance study of memory consistency models," inProc. 18th Ann. Int. Symp. Comput. Architecture, Gold Coast, Australia, May 1992, pp. 2-12.

Index Terms:
performance evaluation; shared memory systems; parallel architectures; shared memory multiprocessors; ring-based; performance; Hector prototype; slotted unidirectional ring; large-scale systems; Communication locality; hot spots; large scale parallel systems; memory banks; performance evaluation; prefetching.
M. Holliday, M. Stumm, "Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors," IEEE Transactions on Computers, vol. 43, no. 1, pp. 52-67, Jan. 1994, doi:10.1109/12.250609
Usage of this product signifies your acceptance of the Terms of Use.