The Community for Technology Leaders
Green Image
Memory speeds have not kept up with processor speeds. More precisely, DRAM latency has not kept pace: Processor speeds have been increasing by at least 70 percent per year, while DRAM latency has improved only 7 percent annually. As a result, a contemporary superscalar 300-MHz DEC Alpha system with a 40-ns DRAM can perform at least 24 instructions in the time it takes to access its memory just once. In a few years, if current trends continue, the number of instructions per access could increase to a thousand. Fortunately, memory bandwidth is another matter. Wider buses, multiple banks, more pins, the integrated circuit properties of DRAMs (such as static-column mode and on-chip cache), and the newer Rambus and synchronous DRAM have all contributed to band-widths that have scaled better than latency. A central problem for memory system designers is how to exploit this bandwidth to achieve lower latencies. In this article, we describe a technique that can convert more than 90 percent of a memory system's bandwidth into low-latency accesses, at least for a particular class of computations. The scheme nicely complements traditional caching in two ways: It handles frequently occurring memory reference patterns for which caches do not perform well and-by removing this problematic data from the cache-it reduces pollution, making the cache more effective for the remaining references.
Sally A. McKee, Robert H. Klenke, Kenneth L. Wright, William A. Wulf, Maximo H. Salinas, James H. Aylor, Alan P. Batson, "Smarter Memory: Improving Bandwidth for Streamed References", Computer, vol. 31, no. , pp. 54-63, July 1998, doi:10.1109/2.689677
81 ms
(Ver 3.3 (11022016))