The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2012 vol.23)
pp: 242-254
Yuho Jin , University of Southern California, Los Angeles
Eun Jung Kim , Texas A&M University, College Station
Timothy Mark Pinkston , University of Southern California, Los Angeles
ABSTRACT
With continued Moore's law scaling, multicore-based architectures are becoming the de facto design paradigm for achieving low-cost and performance/power-efficient processing systems through effective exploitation of available parallelism in software and hardware. A crucial subsystem within multicores is the on-chip interconnection network that orchestrates high-bandwidth, low-latency, and low-power communication of data. Much previous work has focused on improving the design of on-chip networks but without more fully taking into consideration the on-chip communication behavior of application workloads that can be exploited by the network design. A significant portion of this paper analyzes and models on-chip network traffic characteristics of representative application workloads. Leveraged by this, the notion of globally coordinated on-chip networks is proposed in which application communication behavior—captured by traffic profiling—is utilized in the design and configuration of on-chip networks so as to support prevailing traffic flows well, in a globally coordinated manner. This is applied to the design of a hybrid network consisting of a mesh augmented with configurable multidrop (bus-like) spanning channels that serve as express paths for traffic flows benefiting from them, according to the characterized traffic profile. Evaluations reveal that network latency and energy consumption for a 64-core system running OpenMP benchmarks can be improved on average by 15 and 27 percent, respectively, with globally coordinated on-chip networks.
INDEX TERMS
On-chip network design, application communication characterization, performance modeling.
CITATION
Yuho Jin, Eun Jung Kim, Timothy Mark Pinkston, "Communication-Aware Globally-Coordinated On-Chip Networks", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 2, pp. 242-254, February 2012, doi:10.1109/TPDS.2011.164
REFERENCES
[1] A. Agarwal, L. Bao, J. Brown, B. Edwards, M. Mattina, C.-C. Miao, C. Ramey, and D. Wentzlaff, "Tile Processor: Embedded Multicore for Networking and Multimedia," Proc. Hot Chips 19 Archives, 2007.
[2] M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R.A. Hankins, and B. Davies, "The Fuzzy Correlation between Code and Performance Predictability," Proc. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 93-104, 2004.
[3] R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures," Proc. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 245-257, 2000.
[4] S. Chodnekar, V. Srinivasan, A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, "Towards a Communication Characterization Methodology for Parallel Applications," Proc. Third IEEE Symp. High-Performance Computer Architecture (HPCA), pp. 310-319, 1997.
[5] R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks," Proc. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 280-291, 2009.
[6] J.D. Davis, J. Laudon, and K. Olukotun, "Maximizing CMP Throughput with Mediocre Cores," Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 51-62, 2005.
[7] A. Dhodapkar and J.E. Smith, "Managing Multi-Configuration Hardware via Dynamic Working Set Analysis," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 233-244, 2002.
[8] E. Duesterwald, C. Cascaval, and S. Dwarkadas, "Characterizing and Predicting Program Behavior and Its Variability," Proc. 12th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 220-231, 2003.
[9] P. Gratz, B. Grot, and S.W. Keckler, "Regional Congestion Awareness for Load Balance in Networks-on-Chip," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 203-214, 2008.
[10] B. Grot, J. Hestness, S.W. Keckler, and O. Mutlu, "Express Cube Topologies for On-Chip Interconnects," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 163-174, 2009.
[11] W.H. Ho and T.M. Pinkston, "A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns," Proc. Ninth Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 377-388, 2003.
[12] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar, "A 5-GHz Mesh Interconnect for a Teraflops Processor," IEEE Micro, vol. 27, no. 5, pp. 51-61, Sept./Oct. 2007.
[13] C. Isci and M. Martonosi, "Phase Characterization for Power: Evaluating Control Flow-Based and Event-Counter-Based Techniques," Proc. 12th Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 122-133, 2006.
[14] N.D.E. Jerger, M.H. Lipasti, and L.-S. Peh, "Circuit-Switched Coherence," Computer Architecture Letters, vol. 6, no. 1, pp. 5-8, 2007.
[15] A.B. Kahng, B. Li, L.-S. Peh, and K. Samadi, "ORION 2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration," Proc. Conf. Design, Automation and Test in Europe (DATE), pp. 423-428, 2009.
[16] R. Kalla and B. Sinharoy, "POWER7: IBM's Next Generation POWER Microprocessor," Proc. Hot Chips 21 Archives, 2009.
[17] J. Kim, J. Balfour, and W.J. Dally, "Flattened Butterfly Topology for On-Chip Networks," Proc. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 172-182, 2007.
[18] M.M. Kim, J.D. Davis, and T. Austin, "Polymorphic On-Chip Networks," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 101-112, 2008.
[19] M.A. Kinsy, M.H. Cho, T. Wen, G.E. Suh, M. van Dijk, and S. Devadas, "Application-Aware Deadlock-Free Oblivious Routing," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 208-219, 2009.
[20] A. Kumar, L.-S. Peh, P. Kundu, and N.K. Jha, "Express Virtual Channels: Towards the Ideal Interconnection Fabric," Proc. 34th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 150-161, 2007.
[21] J. Lau, S. Schoenmackers, and B. Calder, "Transition Phase Classification and Prediction," Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 278-289, 2005.
[22] J.W. Lee, M.C. Ng, and K. Asanovic, "Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 89-100, 2008.
[23] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[24] M.M. Martin, D.J. Sorin, B.M. Beckmann, M.R. Marty, M. Xu, A.R. Alameldeen, K.E. Moore, M.D. Hill, and D.A. Wood, "Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset," Computer Architecture News, vol. 33, no. 4, pp. 92-99, 2005.
[25] T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks," Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 196-207, 2009.
[26] Ü.Y. Ogras and R. Marculescu, ""It's a Small World After All": NoC Performance Optimization via Long-Range Link Insertion," IEEE Trans. Very Large Scale Integration (VLSI) System, vol. 14, no. 7, pp. 693-706, July 2006.
[27] S. Patel, "Sun's Next-Generation Multi-Threaded Processor— Rainbow Falls," Proc. Hot Chips 21 Archives, 2009.
[28] L.-S. Peh and W.J. Dally, "A Delay Model and Speculative Architecture for Pipelined Routers," Proc. Seventh Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 255-266, 2001.
[29] A. Sang and S. qi Li, "A Predictability Analysis of Network Traffic," Proc. IEEE INFOCOM, pp. 342-351, 2000.
[30] T. Sherwood, E. Perelman, and B. Calder, "Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 3-14, 2001.
[31] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically Characterizing Large Scale Program Behavior," Proc. 10th Int'l Conf. Architecture Support for Programming Language and Operating Systems (ASPLOS), pp. 45-57, 2002.
[32] T. Sherwood, S. Sair, and B. Calder, "Phase Tracking and Prediction," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 336-347, 2003.
[33] SIA "International Technology Roadmap for Semiconductors," http:/public.itrs.net, 2011.
[34] V. Soteriou, H. Wang, and L.-S. Peh, "A Statistical Traffic Model for On-Chip Interconnection Networks," Proc. 14th IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS), pp. 104-116, 2006.
[35] D. Tarjan, S. Thoziyoor, and N.P. Jouppi, "Cacti 4.0." Technical Report HPL-2006-86, HP Laboratories, 2006.
[36] G. Varatkar and R. Marculescu, "Traffic Analysis for On-Chip Networks Design of Multimedia Applications," Proc. 39th Ann. Design Automation Conf. (DAC), pp. 795-800, 2002.
[37] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 24-36, 1995.
[38] Y. Zhang, B. Özisikyilmaz, G. Memik, J. Kim, and A.N. Choudhary, "Analyzing the Impact of On-Chip Network Traffic on Program Phases for CMPs," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 218-226, 2009.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool