This Article 
 Bibliographic References 
 Add to: 
Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers
January/February 2009 (vol. 29 no. 1)
pp. 22-32
Onur Mutlu, Carnegie Mellon University
Thomas Moscibroda, Microsoft Research

Uncontrolled interthread interference in main memory can destroy individual threads' memory-level parallelism, effectively serializing the memory requests of a thread whose latencies would otherwise have largely overlapped, thereby reducing single-thread performance. The parallelism-aware batch scheduler preserves each thread's memory-level parallelism, ensures fairness and starvation freedom, and supports system-level thread priorities.

1. W.K. Zuravleff and T. Robinson, Controller for a Synchronous DRAM that Maximizes Throughput by Allowing Memory Requests and Commands to Be Issued Out of Order, US patent 5,630,096, Patent and Trademark Office, May 1997.
2. S. Rixner, "Memory Controller Optimizations for Web Servers," Proc. 37th Ann. IEEE/ACM Int'l Symp. Microarchitecture (Micro 04), IEEE CS Press, 2004, pp. 355-366.
3. T. Moscibroda and O. Mutlu, "Memory Performance Attacks: Denial of Memory Service in Multicore Systems," Proc. Usenix Security, Usenix Assoc., 2007, pp. 257-274.
4. A. Glew, "MLP Yes! ILP No!" 8th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Wild and Crazy Ideas Session (ASPLOS WACI), 1998.
5. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
6. J.E. Smith and A.R. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," Proc. Int'l Symp. Computer Architecture (ISCA 85), IEEE CS Press, 1985, pp. 36-44.
7. O. Mutlu, H. Kim, and Y.N. Patt, "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," IEEE Micro, vol. 26, no. 1, Jan./Feb. 2006, pp. 10-20.
8. R.M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM J. Research and Development, vol. 11, no. 1, Jan. 1967, pp. 25-33.
9. D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. Int'l Symp. Computer Architecture (ISCA 8), IEEE CS Press, 1981, pp. 81-88.
10. J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions under a Cache Miss," Proc. 11th Int'l Conf. Supercomputing, ACM Press, 1997, pp. 68-75.
11. O. Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors," Proc. 9th Int'l Symp. High-Performance Computer Architecture (HPCA 03), IEEE CS Press, 2003, pp. 129-140.
12. W.E. Smith, "Various Optimizers for Single Stage Production," Naval Research Logistics Quarterly, vol. 3, 1956, pp. 59-66.
13. S. Eyerman and L. Eeckhout, "System-Level Performance Metrics for Multiprogram Workloads," IEEE Micro, vol. 28, no. 3, May/June 2008, pp. 42-53.
14. H. Frank, "Analysis and Optimization of Disk Storage Devices for Time-Sharing Systems," J. ACM, vol. 16, no. 4, Oct. 1969, pp. 602-620.

Index Terms:
memory controllers, DRAM, memory-level parallelism, fairness, multicore, quality of service, chip multiprocessors.
Onur Mutlu, Thomas Moscibroda, "Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers," IEEE Micro, vol. 29, no. 1, pp. 22-32, Jan.-Feb. 2009, doi:10.1109/MM.2009.12
Usage of this product signifies your acceptance of the Terms of Use.