Issue No.01 - January/February (2011 vol.31)
pp: 78-89
Michael Papamichael , Carnegie Mellon University
Yoongu Kim , Carnegie Mellon University
Mor Harchol-Balter , Carnegie Mellon University
<p>Memory schedulers in multicore systems should carefully schedule memory requests from different threads to ensure high system performance and fair, fast progress of each thread. No existing memory scheduler provides both the highest system performance and highest fairness. Thread Cluster Memory scheduling is a new algorithm that achieves the best of both worlds by differentiating latency-sensitive threads from bandwidth-sensitive ones and employing different scheduling policies for each.</p>
Memory controller, memory scheduling algorithms, thread cluster, latency-sensitive threads, bandwidth-sensitive threads, memory intensity, row-buffer locality, bank-level parallelism, memory-level parallelism, fairness, system throughput, multicore, multithreaded systems, multiprocessors, quality of service
Michael Papamichael, Yoongu Kim, Mor Harchol-Balter, "Thread Cluster Memory Scheduling", IEEE Micro, vol.31, no. 1, pp. 78-89, January/February 2011, doi:10.1109/MM.2011.15
1. T. Moscibroda and O. Mutlu, "Memory Performance Attacks: Denial of Memory Service in Multi-core Systems," Proc. 16th USENIX Security Symp. (SS 07), Usenix Assoc., 2007, pp. 257-274.
2. K.J. Nesbit et al., "Fair Queuing Memory Systems," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2006, pp. 208-222.
3. O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2007, pp. 146-160.
4. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
5. Y. Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," Proc. IEEE 16th Int'l Symp. High Performance Computer Architecture (HPCA 10), IEEE Press, 2010, doi:10.1109/HPCA.2010.5416658.
6. Y. Kim et al., "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior," Proc. 43rd Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2010.
7. H. Zheng et al., "Memory Access Scheduling Schemes for Systems with Multi-core Processors," Proc. 37th Int'l Conf. Parallel Processing (ICPP 08), IEEE CS Press, 2008, pp. 406-413.
8. T. Moscibroda and O. Mutlu, "Distributed Order Scheduling and Its Application to Multi-core DRAM Controllers," Proc. 27th ACM Symp. Principles of Distributed Computing (PODC 08), ACM Press, 2008, pp. 365-374.
9. S. Rixner et al., "Memory Access Scheduling," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA 00), ACM Press, 2000, pp. 128-138.
10. "1Gb DDR2 SDRAM Component: MT47H128M8HQ-25," data sheet, Micron, 2010.
11. D. Wang et al., "DRAMsim: A Memory System Simulator," SIGARCH Computer Architecture News, vol. 33, no. 4, 2005, pp. 100-107.
12. A. Snavely and D.M. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Multithreading Processor," Proc. 9th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), ACM Press, 2000, pp. 234-244.
13. R. Das et al., "Application-Aware Prioritization Mechanisms for On-Chip Networks," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 280-291.
14. K. Luo, J. Gummaraju, and M. Franklin, "Balancing Throughput and Fairness in SMT Processors," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, IEEE CS Press, 2001, pp. 164-171.