This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Aérgia: A Network-on-Chip Exploiting Packet Latency Slack
January/February 2011 (vol. 31 no. 1)
pp. 29-41
Reetuparna Das, Pennsylvania State University
Onur Mutlu, Carnegie Mellon University
Thomas Moscibroda, Microsoft Research
Chita R. Das, Pennsylvania State University

A traditional Network-on-Chip (NoC) employs simple arbitration strategies, such as round robin or oldest first, which treat packets equally regardless of the source applications' characteristics. This is suboptimal because packets can have different effects on system performance. We define slack as a key measure for characterizing a packet's relative importance. Aérgia introduces new router prioritization policies that exploit interfering packets' available slack to improve overall system performance and fairness.

1. A. Glew, "MLP Yes! ILP No! Memory Level Parallelism, or Why I No Longer Care about Instruction Level Parallelism," ASPLOS Wild and Crazy Ideas Session, 1998; http://www.cs.berkeley.edu/~kubitron/asplos98/ abstractsandrew_glew.pdf.
2. O. Mutlu, H. Kim, and Y.N. Patt, "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," IEEE Micro, vol. 26, no. 1, 2006, pp. 10-20.
3. B. Fields, S. Rubin, and R. Bodík, "Focusing Processor Policies Via Critical-Path Prediction," Proc. 28th Ann. Int'l Symp. Computer Architecture (ISCA 01), ACM Press, 2001, pp. 74-85.
4. R. Das et al., "Aérgia: Exploiting Packet Latency Slack in On-Chip Networks," Proc. 37th Ann. Int'l Symp. Computer Architecture (ISCA 10), ACM Press, 2010, pp. 106-116.
5. R.M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM J. Research and Development, vol. 11, no. 1, 1967, pp. 25-33.
6. O. Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors," Proc. 9th Int'l Symp. High-Performance Computer Architecture (HPCA 03), IEEE Press, 2003, pp. 129-140.
7. B. Fields, R. Bodík, and M. Hill, "Slack: Maximizing Performance under Technological Constraints," Proc. 29th Ann. Int'l Symp. Computer Architecture, IEEE Press, 2002, pp. 47-58, doi:10.1109/ISCA.2002.1003561.
8. D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. 8th Ann. Symp. Computer Architecture (ISCA 81), IEEE CS Press, 1981, pp. 81-87.
9. T.Y. Yeh and Y.N. Patt, "Two-Level Adaptive Training Branch Prediction," Proc. 24th Ann. Int'l Symp. Microarchitecture, ACM Press, 1991, pp. 51-61.
10. R. Das et al., "Application-Aware Prioritization Mechanisms for On-Chip Networks," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 280-291.
11. J.W. Lee, M.C. Ng, and K. Asanovic, "Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 89-100.
12. L.R. Hsu et al., "Communist, Utilitarian, and Capitalist Cache Policies on CMPS: Caches as a Shared Resource," Proc. 15th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT 06), ACM Press, 2006, pp. 13-22.
13. O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS Press, 2007, pp. 146-160.
14. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
15. Y. Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," Proc. IEEE 16th Int'l Symp. High Performance Computer Architecture (HPCA 10), IEEE Press, 2010, doi:10.1109/HPCA.2010.5416658.
16. Y. Kim et al., "Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior," Proc. 43rd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2010.
1. S.T. Srinivasan and A.R. Lebeck, "Load Latency Tolerance in Dynamically Scheduled Processors," Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, IEEE CS Press, 1998, pp. 148-159.
2. B. Fields, S. Rubin, and R. Bodík, "Focusing Processor Policies via Critical-Path Prediction," Proc. 28th Ann. Int'l Symp. Computer Architecture (ISCA 01), ACM Press, 2001, pp. 74-85.
3. B. Fields, R. Bodík, and M. Hill, "Slack: Maximizing Performance under Technological Constraints," Proc. 29th Ann. Int'l Symp. Computer Architecture, IEEE Press, 2002, pp. 47-58, doi:10.1109/ISCA.2002.1003561.
4. M. Qureshi et al., "A Case for MLP-Aware Cache Replacement," Proc. 33rd Ann. Int'l Symp. Computer Architecture (ISCA 06), IEEE CS Press, 2006, pp. 167-178.
5. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 63-74.
6. Y. Kim et al., "ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers," Proc. IEEE 16th Int'l Symp. High Performance Computer Architecture (HPCA 10), IEEE Press, 2010, doi:10.1109/HPCA.2010.5416658.
7. J.W. Lee, M.C. Ng, and K. Asanovic, "Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA 08), IEEE CS Press, 2008, pp. 89-100.
8. R. Das et al., "Application-Aware Prioritization Mechanisms for On-Chip Networks," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 280-291.
9. E. Bolotin et al., "The Power of Priority: NoC Based Distributed Cache Coherency," Proc. 1st Int'l Symp. Networks-on-Chip (NOCS 07), IEEE Press, 2007, pp. 117-126, doi:10.1109/NOCS.2007.42.
10. E. Bolotin et al., "QNoC: QoS Architecture and Design Process for Network on Chip," J. Systems Architecture, vol. 50, nos. 2-3, 2004, pp. 105-128.
11. E. Rijpkema et al., "Trade-offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip," Proc. Conf. Design, Automation and Test in Europe (DATE 03), vol. 1, IEEE CS Press, 2003, pp. 10350-10355.
12. B. Grot, S.W. Keckler, and O. Mutlu, "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip," Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM Press, 2009, pp. 268-279.
13. K.H. Yum, E.J. Kim, and C. Das, "QoS Provisioning in Clusters: An Investigation of Router and NIC Design," Proc. 28th Ann. Int'l Symp. Computer Architecture (ISCA 01), ACM Press, 2001, pp. 120-129.
14. A.A. Chien and J.H. Kim, "Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks," Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA 96), ACM Press, 1996, pp. 226-236.
15. A. Demers, S. Keshav, and S. Shenker, "Analysis and Simulation of a Fair Queueing Algorithm," Symp. Proc. Comm. Architectures & Protocols (Sigcomm 89), ACM Press, 1989, pp. 1-12, doi:10.1145/75246.75248.
16. L. Zhang, "Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks," Proc. ACM Symp. Comm. Architectures & Protocols (Sigcomm 90), ACM Press, 1990, pp. 19-29, doi:10.1145/99508.99525.
17. H. Frank, "Analysis and Optimization of Disk Storage Devices for Time-Sharing Systems," J. ACM, vol. 16, no. 4, 1969, pp. 602-620.

Index Terms:
On-chip networks, multicore, arbitration, prioritization, memory systems, packet scheduling, slack, criticality
Citation:
Reetuparna Das, Onur Mutlu, Thomas Moscibroda, Chita R. Das, "Aérgia: A Network-on-Chip Exploiting Packet Latency Slack," IEEE Micro, vol. 31, no. 1, pp. 29-41, Jan.-Feb. 2011, doi:10.1109/MM.2010.98
Usage of this product signifies your acceptance of the Terms of Use.