This Article 
 Bibliographic References 
 Add to: 
The DASH Prototype: Logic Overhead and Performance
January 1993 (vol. 4 no. 1)
pp. 41-61

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data show that the overhead is only about 10-15%, which appears to be a small cost for the ease of programming offered by coherent caches and the potential for higher performance. The performance of the system is discussed, and the speedups obtained by a variety of parallel applications running on the prototype are shown. Using a sophisticated hardware performance monitor, the effectiveness of coherent caches and the relationship between an application's reference behavior and its speedup are characterized. The optimizations incorporated in the DASH protocol are evaluated in terms of their effectiveness on parallel applications and on atomic tests that stress the memory system.

[1] A. Agarwal et al., "Limitless Directories: A Scalable Cache Coherence Scheme,"Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, New York, 1991, pp. 224-234.
[2] F. Baskett, T. Jermoluk, and D. Solomon, "The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 100,000 lighted polygons per second," inProc. 33rd IEEE Comput. Soc. Int. Conf. - COMPCONvol. 88, Feb. 1988, pp. 468-471.
[3] L. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems,"IEEE Trans. Comput., vol. C-27, pp. 1112-1118, Dec. 1978.
[4] C.M. Flaig, "VLSI Mesh Routing Systems," Caltech Computer Sci. Tech. Report, 5241:TR:87, 1987.
[5] C.C. Howell and D. E. Mularz, "Exception handling in large Ada systems," inProc. Washington Ada Symp., 1991.
[6] S.R. Goldschmidt and H. Davis, "Tango Introduction and Tutorial," Tech. Report CSL-TR-90-410, Computer Systems Laboratory, Stanford Univ., Stanford, Calif., Jan. 1990.
[7] A. Gupta et al., "Comparative Evaluation of Latency Reducing and Tolerating Techniques,"Proc. 18th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 2146, 1991, pp. 254-263.
[8] A. Gupta, W.-D. Weber, and T. Mowry, "Reducing memory and traffic requirements for scalable directory-based cache coherence schemes," inProc. 1990 Int. Conf. Parallel Processing, Aug. 1990, pp. 1:312-321.
[9] D. Lenoski et al., "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor,"Proc. 17th Int'l Symp. Computer Architecture, CS Press, Los Alamitos, Calif., May 1990, pp. 148-159.
[10] D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam, "The Stanford DASH Multiprocessor,"IEEE Comput. Mag., vol. 25, no. 3, Mar. 1992.
[11] D. Lenoski, "The design and analysis of DASH: A scalable shared-memory multiprocessor," Ph.D. dissertation. Stanford Univ., Dec. 1991.
[12] E. Lusk, R. Overbeek,et al., Portable Programs for Parallel Processors. New York: Holt, Rinehart, and Winston, 1987.
[13] B.W. O'Krafka and A.R. Newton, "An Empirical Evaluation of Two Memory-Efficient Directory Methods,"Proc. 17th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 2047, 1990, pp. 138-147.
[14] M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," inProc. 11th Int. Symp. Comput. Architecture, June 1984, pp. 348-354.
[15] J.P. Singh, W.-D. Weber, and A. Gupta, "SPLASH: Stanford parallel applications for shared-memory," Tech. Rep. CSL-TR-91-469, Stanford Univ., Apr. 1991.
[16] J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy, "Load balancing and data locality in hierarchicalN-body methods," Tech. Rep. CSL-TR-92-505, Stanford Univ., 1992.
[17] Xilinx,The Programmable Gate Array Data Book, 1991.

Index Terms:
Index TermsDASH project; large-scale shared-memory multiprocessors; directory-based cachecoherence; coherent caches; hardware performance monitor; reference behavior; DASHprotocol; atomic tests; buffer storage; parallel programming; performance evaluation;shared memory systems; storage management
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, J. Hennessy, "The DASH Prototype: Logic Overhead and Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp. 41-61, Jan. 1993, doi:10.1109/71.205652
Usage of this product signifies your acceptance of the Terms of Use.