Issue No. 01 - January-June (2006 vol. 5)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/L-CA.2006.4
Data parallel memory systems must maintain a large number of outstanding memory references to fully use increasing DRAM bandwidth in the presence of increasing latency. At the same time, the throughput of modern DRAMs is very sensitive to access patterns due to the time required to precharge and activate banks and to switch between read and write access. To achieve memory reference parallelism a system may simultaneously issue references from multiple reference threads. Alternatively multiple references from a single thread can be issued in parallel. In this paper we examine this tradeoff and show that allowing only a single thread to access DRAM at any given time significantly improves performance by increasing the locality of the reference stream and hence reducing precharge/activate operations and read/write turnaround. Simulations of scientific and multimedia applications show that generating multiple references from a single thread gives, on average, 17% better performance than generating references from two parallel threads.
J. H. Ahn and W. J. Dally, "Data Parallel Address Architecture," in IEEE Computer Architecture Letters, vol. 5, no. , pp. 30-33, 2006.