This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Thread Cluster Memory Scheduling
January/February 2011 (vol. 31 no. 1)
pp. 78-89
Yoongu Kim, Carnegie Mellon University
Michael Papamichael, Carnegie Mellon University
Onur Mutlu, Carnegie Mellon University
Mor Harchol-Balter, Carnegie Mellon University

Memory schedulers in multicore systems should carefully schedule memory requests from different threads to ensure high system performance and fair, fast progress of each thread. No existing memory scheduler provides both the highest system performance and highest fairness. Thread Cluster Memory scheduling is a new algorithm that achieves the best of both worlds by differentiating latency-sensitive threads from bandwidth-sensitive ones and employing different scheduling policies for each.

1. T. Moscibroda and O. Mutlu, "Memory Performance Attacks: Denial of Memory Service in Multi-core Systems," Proc. 16th USENIX Security Symp. (SS 07), Usenix Assoc., 2007, pp. 257-274.
2. K.J. Nesbit et al., "Fair Queuing Memory Systems," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2006, pp. 208-222.
3. O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2007, pp. 146-160.
4. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
5. Y. Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," Proc. IEEE 16th Int'l Symp. High Performance Computer Architecture (HPCA 10), IEEE Press, 2010, doi:10.1109/HPCA.2010.5416658.
6. Y. Kim et al., "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior," Proc. 43rd Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2010.
7. H. Zheng et al., "Memory Access Scheduling Schemes for Systems with Multi-core Processors," Proc. 37th Int'l Conf. Parallel Processing (ICPP 08), IEEE CS Press, 2008, pp. 406-413.
8. T. Moscibroda and O. Mutlu, "Distributed Order Scheduling and Its Application to Multi-core DRAM Controllers," Proc. 27th ACM Symp. Principles of Distributed Computing (PODC 08), ACM Press, 2008, pp. 365-374.
9. S. Rixner et al., "Memory Access Scheduling," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA 00), ACM Press, 2000, pp. 128-138.
10. "1Gb DDR2 SDRAM Component: MT47H128M8HQ-25," data sheet, Micron, 2010.
11. D. Wang et al., "DRAMsim: A Memory System Simulator," SIGARCH Computer Architecture News, vol. 33, no. 4, 2005, pp. 100-107.
12. A. Snavely and D.M. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Multithreading Processor," Proc. 9th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), ACM Press, 2000, pp. 234-244.
13. R. Das et al., "Application-Aware Prioritization Mechanisms for On-Chip Networks," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 280-291.
14. K. Luo, J. Gummaraju, and M. Franklin, "Balancing Throughput and Fairness in SMT Processors," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, IEEE CS Press, 2001, pp. 164-171.
1. W.K. Zuravlev and T. Robinson, Controller for a Synchronous DRAM that Maximizes Throughput by Allowing Memory Requests and Commands to Be Issued Out of Order, US patent 5,630,096, to Microunity Systems Eng., Inc., 1997.
2. S. Rixner et al., "Memory Access Scheduling," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA 00), ACM Press, 2000, pp. 128-138.
3. L. Zhang et al., "The Impulse Memory Controller," IEEE Trans. Computers, vol. 50, no. 11, 2001, pp. 1117-1132.
4. S.A. McKee et al., "Dynamic Access Ordering for Streamed Computations," IEEE Trans. Computers, vol. 49, no. 11, 2000, pp. 1255-1271.
5. I. Hur and C. Lin, "Adaptive History-Based Memory Schedulers," Proc. 37th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2004, pp. 343-354.
6. J. Shao and B.T. Davis, "A Burst Scheduling Access Reordering Mechanism," Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA 07), IEEE CS Press, 2007, pp. 285-294.
7. C. Natarajan et al., A Study of Performance Impact of Memory Controller Features in Multi-processor Server Environment," Proc. 3rd Workshop Memory Performance Issues (WMPI 04), ACM Press, 2004, pp. 80-87.
8. G.L. Yuan, A. Bakhoda, and T.M. Aamodt, "Complexity Effective Memory Access Scheduling for Many-Core Accelerator Architectures," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, 34-44.
9. T. Moscibroda and O. Mutlu, "Memory Performance Attacks: Denial of Memory Service in Multi-core Systems," Proc. 16th USENIX Security Symp. (SS 07), Usenix Assoc., 2007, pp. 257-274.
10. K.J. Nesbit et al., "Fair Queuing Memory Systems," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2006, pp. 208-222.
11. N. Rafique, W.-T. Lim, and M. Thottethodi, "Effective Management of DRAM Bandwidth in Multicore Processors," Proc. 16th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT 07), IEEE CS Press, 2007, pp. 245-258.
12. E. Ipek et al., "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 39-50.
13. O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2007, pp. 146-160.
14. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
15. Y. Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," Proc. IEEE 16th Int'l Symp. High Performance Computer Architecture (HPCA 10), IEEE Press, 2010, doi:10.1109/HPCA.2010.5416658.
16. Z. Zhu and Z. Zhang, "A Performance Comparison of DRAM Memory System Optimizations for SMT Processors," Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA 05), IEEE CS Press, 2005, pp. 213-224.
17. C.J. Lee et al., "Prefetch-Aware DRAM Controllers," Proc. 41st Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2008, pp. 200-209.

Index Terms:
Memory controller, memory scheduling algorithms, thread cluster, latency-sensitive threads, bandwidth-sensitive threads, memory intensity, row-buffer locality, bank-level parallelism, memory-level parallelism, fairness, system throughput, multicore, multithreaded systems, multiprocessors, quality of service
Citation:
Yoongu Kim, Michael Papamichael, Onur Mutlu, Mor Harchol-Balter, "Thread Cluster Memory Scheduling," IEEE Micro, vol. 31, no. 1, pp. 78-89, Jan.-Feb. 2011, doi:10.1109/MM.2011.15
Usage of this product signifies your acceptance of the Terms of Use.