This Article 
 Bibliographic References 
 Add to: 
High-Performance DRAMs in Workstation Environments
November 2001 (vol. 50 no. 11)
pp. 1133-1153

Abstract—This paper presents a simulation-based performance study of several of the new high-performance DRAM architectures, each evaluated in a small system organization. These small-system organizations correspond to workstation-class computers and use only a handful of DRAM chips (~10, as opposed to ~1 or ~100). The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Double Data Rate, Synchronous Link, Rambus, and Direct Rambus designs. Our simulations reveal several things: 1) Current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; 2) bus transmission speed will soon become a primary factor limiting memory-system performance; 3) the post-L2 address stream still contains significant locality, though it varies from application to application; 4) systems without L2 caches are feasible for low- and medium-speed CPUs (1GHz and below); and 5) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality to decrease access time.

[1] L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," Proc. 25th Int'l Symp. Computer Architecture, June 1998, pp. 3-14.
[2] D. Bhandarkar and J. Ding, “Performance Characterization of the Pentium Pro Processor,” Proc. Third Int'l Symp. High Performance Computer Architecture (HPCA'97), pp. 288-297, Feb. 1997.
[3] N. Bowman, N. Cardwell, C. Kozyrakis, C. Romer, and H. Wang, “Evaluation of Existing Architectures in IRAM Systems,” Proc. Workshop Mixing Logic and DRAM, June 1997.
[4] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, Version 2.0,” Technical Report CS-1342, Univ. of Wisconsin-Madison, June 1997.
[5] D. Burger, J.R. Goodman, and A. Kägi, "Memory Bandwidth Limitations of Future Microprocessors," Proc. 23rd Ann. Int'l Symp. Computer Architecture, Association of Computing Machinery, New York, 1996, pp. 79-90.
[6] R. Crisp, "Direct Rambus Technology: The New Main Memory Standard," IEEE Micro, Vol. 17, No. 6, Nov./Dec. 1997, pp. 18-28.
[7] V. Cuppu and B. Jacob, “Concurrency, Latency, or System Overhead: Which Has the Largest Impact on Uniprocessor DRAM-System Performance?” Proc. 28th Int'l Symp. Computer Architecture (ISCA '01), June 2001.
[8] V. Cuppu and B. Jacob, “Organizational Design Trade-Offs at the DRAM, Memory Bus, and Memory Controller Level: Initial Results,” Technical Report UMD-SCA-1999-2, Univ. of Maryland Systems&Computer Architecture Group, Nov. 1999.
[9] V. Cuppu, B. Jacob, B. Davis, and T. Mudge, “A Performance Comparison of Contemporary DRAM Architectures,” Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 222-233, May 1999.
[10] Z. Cvetanovic and D. Bhandarkar, “Performance Characterization of the Alpha 21164 Microprocessor Using TP and SPEC Workloads,” Proc. Second Int'l Symp. High Performance Computer Architecture (HPCA '96), pp. 270-280, Feb. 1996.
[11] B. Davis, T. Mudge, B. Jacob, and V. Cuppu, “DDR2 and Low Latency Variants,” Proc. Memory Wall Workshop at the 26th Ann. Int'l Symp. Computer Architecture, May 2000.
[12] B. Dipert, “The Slammin, Jammin, DRAM Scramble,” EDN, vol. 2000, no. 2, pp. 68-82, Jan. 2000.
[13] “ESDRAM, Enhanced SDRAM 1M x 16,” Enhanced Memory Systems, Inc., 16M_esdram0298a.pdf, 1998.
[14] “Etch: Memory System Research at the University of Washington,” Univ. of Washington,http:/, 1998.
[15] J.R. Goodman and M. Chiang, “The Use of Static Column RAM as a Memory Hierarchy,” Proc. 11th Ann. Int'l Symp. Computer Architecture (ISCA '84), pp. 167-174, June 1984.
[16] L. Gwennap, “Alpha 21364 to Ease Memory Bottleneck: Compaq Will Add Direct RDRAM to 21264 Core for Late 2000 Shipments,” Microprocessor Report, vol. 12, no. 14, pp. 12-15, Oct. 1998.
[17] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[18] W.-C. Hsu and J.E. Smith, “Performance of Cached DRAM Organizations in Vector Supercomputers,” Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA '93), pp. 327-336, May 1993.
[19] IBM, “EDO DRAM 4M x 16 Part No. IBM0165165PT3C,” IBM, 88H201188H2011.pdf, 1998.
[20] IBM, “SDRAM 1M x 16 x 4 Bank Part No. IBM0364164,” IBM, 19L326519L3265.pdf, 1998.
[21] IBM, “DDR DRAM 16M x 8 Part No. IBM0612804GT3B,” IBM, 06K056606K0566.pdf, 2000.
[22] K. Keeton et al., "Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads," Proc. 25th Int'l Symp. Computer Architecture, CS Press, Los Alamitos, Calif., 1998, pp. 15-26.
[23] C. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick, “Scalable Processors in the Billion-Transistor Era: IRAM,” Computer, vol. 30, no. 9, pp. 75-78, Sept. 1997.
[24] S. McKee, A. Aluwihare, B. Clark, R. Klenke, T. Landon, C. Oliver, M. Salinas, A. Szymkowiak, K. Wright, W. Wulf, and J. Aylor, “Design and Evaluation of Dynamic Access Ordering Hardware,” Proc. Int'l Conf. Supercomputing, May 1996.
[25] S.A. McKee and W.A. Wulf, “Access Ordering and Memory-Conscious Cache Utilization,” Proc. First Int'l Symp. High-Performance Computer Architecture, pp. 253-262, Jan. 1995.
[26] B. Nayfeh, L. Hammond, and K. Olukotun, “Evaluation of Design Alternatives for a Multiprocessor Microprocessor,” Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA '96), pp. 67-77, May 1996.
[27] B.A. Nayfeh, K. Olukotun, and J.P. Singh, “The Impact of Shared-Cache Clustering in Small-Scale Shared-Memory Multiprocessors,” Proc. Second Int'l Symp. High Performance Computer Architecture (HPCA '96), pp. 74-84, Feb. 1996.
[28] S. Przybylski, “New DRAM Technologies: A Comprehensive Analysis of the New Architectures,” MicroDesign Resources, Sebastopol, Calif., 1996.
[29] Rambus, “Rambus Memory: Enabling Technology for PC Graphics,” technical report, Rambus Inc., Mountain View, Calif., Oct. 1994.
[30] Rambus, “Comparing RDRAM and SGRAM for 3D Applications,” technical report, Rambus Inc., Mountain View, Calif., Oct. 1996.
[31] Rambus, “Memory Latency Comparison,” technical report, Rambus Inc., Mountain View, Calif., Sept. 1996.
[32] Rambus, “16/18Mbit&64/72Mbit Concurrent RDRAM Data Sheet,” Rambus,, 1998.
[33] Rambus, “Direct RDRAM 64/72-Mbit Data Sheet,” Rambus,, 1998.
[34] P. Ranganathan, K. Gharachorloo, S.V. Adve, and L.A. Barroso, “Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '98), pp. 307-318, Oct. 1998.
[35] M. Rosenblum, E. Bugnion, S.A. Herrod, E. Witchel, and A. Gupta, “The Impact of Architectural Trends on Operating System Performance,” Proc. 15th ACM Symp. Operating System Principles, Dec. 1995.
[36] Samsung, “FPM DRAM 4M x 16 Part No. KM416V4100C,” Samsung Semiconductor, prodspec/dramcompKM416V40(1)00C.PDF, 1998.
[37] A. Saulsbury, F. Pong, and A. Nowatzyk, “Missing the Memory Wall: The Case for Processor/Memory Integration,” Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA '96), pp. 90-101, May 1996.
[38] SLDRAM, “4M x 18 SLDRAM Advance Datasheet,” SLDRAM, Inc., , 1998.
[39] R. Wilson, “MoSys Tries Synthetic SRAM,” EE Times Online, July 1997, .
[40] B. Davis, “Modern DRAM Architectures,” PhD thesis, Univ. of Michigan, 2001.

Index Terms:
DRAM architectures, DRAM performance, DRAM systems, system modeling, DDR DRAM, Direct Rambus DRAM, PC100 SDRAM, DDR2 DRAM.
Vinodh Cuppu, Bruce Jacob, Brian Davis, Trevor Mudge, "High-Performance DRAMs in Workstation Environments," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1133-1153, Nov. 2001, doi:10.1109/12.966491
Usage of this product signifies your acceptance of the Terms of Use.