This Article 
 Bibliographic References 
 Add to: 
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
May 1993 (vol. 4 no. 5)
pp. 520-534

Multis, shared-memory multiprocessors that are implemented with single buses andsnooping cache protocols are inherently limited to a small number of processors, and, assystems grow beyond a single bus, the bandwidth requirements of broadcast operationslimit scalability. Hardware support to provide cache coherence without the use ofbroadcast can become very expensive. An approach to maintaining coherence usingapproximate information held in special-purpose caches called pruning-caches thatprovides robust performance over a wide range of workloads is presented. Thepruning-cache approach is compared to the more conventional inclusion cache forproviding multilevel inclusion (MLI) in the cache hierarchy. It is shown thatpruning-caches are more cost-effective and more robust. Using both analysis andsimulation, it is also shown that the k-ary n-cube topology provides scalable,bottleneck-free communication for uniform, point-to-point traffic.

[1] S. V. Adve and M.D. Hill. "Weak Ordering--A New definition,"Proc. 17th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, June 1990, pp 2-14.
[2] A. Agarwal, "Limits on interconnection network performance,"IEEE Trans. Parallel Distributed Syst., pp. 398-412, Oct. 1991.
[3] J.-L. Baer and W.-H. Wang, "On the Inclusion Properties for Multi-Level Cache Hierarchies,"Proc. 15th Ann. Int'l Symp. Computer Architecture, IEEE-CS Press, Los Alamitos, Calif., Order No. 861, 1988, pp. 73-80.
[4] C. G. Bell, "Multis: A new class of multiprocessor computers,"Science, vol. 228, pp. 462-467, Apr. 1985.
[5] L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems,"IEEE Trans. Comput., vol. C-27, no. 12, pp. 1112-1118, Dec. 1978.
[6] M. Dubois, C. Scheurich, and F. Briggs, "Memory Access Buffering In Multiprocessors,"Proc. 13th Int'l Symp. Computer Architecture, June 1986, pp. 434-442.
[7] J.R. Goodman and P.J. Woest, "The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiprocessor,"Proc. 15th Int'l Symp. Computer Architecture, CS Press, Los Alamitos, Calif., June 1988, pp. 422-431.
[8] J.R. Goodman, M.K. Vernon, and P.J. Woest, "Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors,"Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, Calif., Order No. 1936, 1989, pp. 64-73.
[9] J. R. Goodman, M. D. Hill, and P. J. Woest, "Scalability and its application to multicube," Comput. Sci. Tech. Rep. 835, Univ. Wisconsin-Madison, Madison, WI 53706, Mar. 1989.
[10] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD, shared memory parallel machine,"IEEE Trans. Comput., vol. C-32, no. 2, pp. 175-189, Feb. 1983.
[11] A. Gupta, W. Weber, and T. Mowry, "Reducing memory and traffic requirements for scalable directory-based cache coherence schemes," inProc. 1990 Int. Conf. Parallel Processing, Aug. 1990, pp. I312-I321.
[12] IEEE, Scalable Coherent Interface, Logical Specification, IEEE Standard Specification P1596: Part II, Oct. 1991.
[13] C.-Y. Lam and S. E. Madnick, "Properties of storage hierarchy systems with multiple page sizes and redundant data,"ACM Trans. Database Syst., vol. 4, no. 3, pp. 345-367, Sept. 1979.
[14] L. Lamport, "How to make a multiprocessor computer that correctly executes multiprocess programs,"IEEE Trans. Comput., vol. 28, no. 9, pp. 690-691, Sept. 1979.
[15] J.M. Mellor-Crummey and M.L. Scott, "Synchronization Without Contention,"Proc. Fourth Int'l Conf. Architectural support for Programming Languages and Operation Systems, Assoc. of Computing Machinery, N.Y., pp. 269-278.
[16] T.N. Mudge, J. P. Hayes, and D.C. Winsor, "Multiple-bus architectures,"IEEE Comput. Mag., pp. 42-48, June 1987.
[17] G. F. Pfisteret al., "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," inProc. 1985 Int. Conf. Parallel Processing, Aug. 1985, pp. 764-771.
[18] S. L. Scott and J. R. Goodman, "Performance of pipelinedk-aryn- cube networks,"IEEE Trans. Parallel Distributed Syst., to be published. Preliminary version available as Computer Sciences Technical Report #1010, Univ. Wisconsin, Madison, WI 53706, Feb. 1991.
[19] S. L. Scott, "A cache coherence mechanism for scalable, shared memory multiprocessors," inProc. 1991 Int. Symp. Shared Memory Multiprocessing, Apr. 1991.
[20] A.W. Wilson, Jr., "Hierarchical Cache/ Bus Architecture for Shared Memory Multiprocessors,"Proc. 14th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 776, 1987, pp. 244-252.
[21] D. C. Winsor and T. N. Mudge, "Analysis of bus hierarchies for multiprocessors," inProc. 15th Annu. Int. Symp. Comput. Architecture, June 1988, pp. 100-107.IEEE Trans. Comput., vol. C-34, pp. 934-942, Oct. 1985.
[22] P.-C. Yew, N.-F. Tzeng, and D.H. Lawrie, "Distributing hot-spot addressing in large-scale multiprocessors,"IEEE Trans. Comput., vol. C- 36, pp. 388-395, Apr. 1987.

Index Terms:
Index Termspruning-cache directories; large-scale multiprocessors; shared-memory multiprocessors;multilevel inclusion; n-cube topology; bottleneck-free communication; buffer storage;memory architecture; multiprocessor interconnection networks; shared memory systems;storage management
S.L. Scott, J.R. Goodman, "Performance of Pruning-Cache Directories for Large-Scale Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 5, pp. 520-534, May 1993, doi:10.1109/71.224215
Usage of this product signifies your acceptance of the Terms of Use.