The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.25)
pp: 83-92
Randy Morris , Sch. of Electr. Eng. & Comput. Sci., Ohio Univ., Athens, OH, USA
Evan Jolley , Sch. of Electr. Eng. & Comput. Sci., Ohio Univ., Athens, OH, USA
Avinash Karanth Kodi , Sch. of Electr. Eng. & Comput. Sci., Ohio Univ., Athens, OH, USA
ABSTRACT
As the number of cores increases exponentially on a single chip, the design and integration of both the on-chip network facilitating intercore communication, and the cache coherence protocol for enabling shared memory programming have become critical for improved energy-efficiency and overall chip performance. With traditional metal interconnects facing stringent energy constraints, researchers are currently pursuing disruptive solutions such as nanophotonics for improved energy-efficiency. Cache coherence in multicores can be enforced effectively by snoopy protocols; however, broadcasting every cache miss can limit the scalability while consuming excess energy. In this paper, we propose PULSE, a nanophotonic broadcast tree-based network for snoopy cache coherent multicores. To limit the energy-penalty from broadcasting (and thereby splitting) optical signals, we direct the optical signal from the external laser such that only the subset of requesters can receive the optical signal. Furthermore, as cache blocks are shared by a few cores, we propose a multicast version of PULSE called multi-PULSE that predicts the sharers' for each L2 miss and morphing the broadcast to a multicast network. We evaluate the energy and performance using CACTI and SIMICS on 16-core and 64-core versions of PULSE and multi-PULSE for Splash-2, PARSEC, and SPEC CPU2006 benchmarks and compare to electrical networks, optical networks, and another cache filtering techniques. Our results indicate that PULSE outperforms competitive electrical/optical networks by 60 percent in terms of execution time, and multi-PULSE reduces average energy from 10 to 80 percent even with a few mispredictions.
INDEX TERMS
Multicore processing, Optical waveguides, Optical pulses, Adaptive optics, Optical ring resonators, Optical receivers,multicast, Network-on-chips, nanophotonics, cache coherence, broadcast
CITATION
Randy Morris, Evan Jolley, Avinash Karanth Kodi, "Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology", IEEE Transactions on Parallel & Distributed Systems, vol.25, no. 1, pp. 83-92, Jan. 2014, doi:10.1109/TPDS.2013.26
REFERENCES
[1] R. Ho, K.W. Mai, and M.A. Horowitz, "The Future of Wires," Proc. IEEE, vol. 89, no. 4, pp. 490-504, Apr. 2001.
[2] I. Quickpath, http://www.intel.com/content/www/us/en/io/ quickpath-technologyquickpath-technology-general.html , 2013.
[3] A. Hypertransport, http://www.amd.com/us/products/ technologies hypertransport-technology.aspx, 2013.
[4] N. Kirman et al., "Leveraging Optical Technology in Future Bus-Based Chip Multiprocessors," Proc. 39th Int'l Symp. Microarchitecture, Dec. 2006.
[5] D.A.B. Miller, "Device Requirements for Optical Interconnects to Silicon Chips," Proc. IEEE, Special Issue on Silicon Photonics, vol. 97, no. 7, pp. 1166-1185, July 2009.
[6] C. Batten et al., "Building Manycore Processor-to-Dram Networks with Monolithic Silicon Photonics," Proc. 16th Ann. Symp. High-Performance Interconnects, Aug. 2008.
[7] A. Shacham, K. Bergman, and L.P. Carloni, "Photonic Networks-On-Chip for Future Generations of Chip Multiprocessors," IEEE Trans. Computers, vol. 57, no. 9, pp. 1246-1260, Sept. 2008.
[8] P. Koka, M.O. McCracken, H. Schwetman, X. Zheng, R. Ho, and A.V. Krishnamoorthy, "Silicon-Photonic Network Architectures for Scalable, Power-Efficient Multi-Chip Systems," Proc. Int'l Symp. Computer Architecture (ISCA), June 2010.
[9] M. Georgas, J. Leu, B. Moss, C. Sun, and V. Stojanovic, "Addressing Link-Level Design Tradeoffs for Integrated Photonic Interconnects," Proc. IEEE Custom Integrated Circuits Conf. (CICC '12) pp. 1-8, 2011.
[10] D. Culler, J. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, first ed. Morgan Kaufmann, http://www.amazon.com/Parallel-Computer-Architecture-Hardware-Software/ dp1558603433 , 1998.
[11] S. Woo et al., "The Splash-2 Program: Characterization and Methodological Considerations," Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 24-36, 1995.
[12] C. Bienia et al., "The Parsec Benchmark Suite: Characterization and Architectural Implications," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 2008.
[13] N. Sherwood-Droz, K. Preston, J.S. Levy, and M. Lipson, "Device Guidelines for WDM Interconnects Using Silicon Microring Resonators," Proc. Workshop the Interaction between Nanophotonic Devices and Systems (WINDS '10), Co Located with Micro 43, pp. 15-18, Dec. 2010.
[14] X. Zheng, F. Liu, J. Lexau, D. Patil, G. Li, Y. Luo, H. Thacker, I. Shubin, J. Yao, K. Raj, R. Ho, J. Cunningham, and A. Krishnamoorthy, "Ultra-Low Power Arrayed CMOS Silicon Photonic Transceivers for an 80 Gbps WDM Optical Link," Proc. Optical Fiber Comm. Conf. and Exposition (OFC '11), Mar. 2011.
[15] S.J. Koester, C.L. Schow, L. Schares, and G. Dehlinger, "Ge-on-SOI-Detector/Si-CMOS-Amplifier Receivers for High-Performance Optical-Communication Applications," J. Lightwave Technology, vol. 25, no. 1, pp. 46-57, Jan. 2007.
[16] K. Strauss, X. Shen, and J. Torrellas, "Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors," Proc. 33rd Ann. Int'l Symp. Computer Architecture (ISCA '06), pp. 327-338, 2006.
[17] M. Martin et al., "Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA '03), 2003.
[18] E.E. Bilir et al., "Multicast Snooping: A New Coherence Method Using a Multicast Address Network," Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA '99), pp. 294-304, 1999.
[19] M. Cianchetti, J. Kerekes, and D. Albonesi, "Phastlane: A Rapid Transit Optical Routing Network," Proc. 36th Int'l Symp. Computer Architecture, June 2009.
[20] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg, J. Hgberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[21] M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood, "Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset," ACM SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92-99, Nov. 2005.
[22] A.B. Kahng et al., "Orion 2.0: A Fast and Accurate Noc Power and Area Model for Early-Stage Design Space Exploration," Proc. Design, Automation and Test in Europe Conf. ad Exhibition (DATE '09), pp. 423-428, Apr. 2009.
[23] N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi, "Cacti 6.0: A Tool to Understand Large Caches," technical report, Univ. of Utah, 2007.
[24] M. Georgas, J. Leu, B. Moss, C. Sun, and V. Stojanovic, "Addressing Link-Level Design Tradeoffs for Integrated Photonic Interconnects," Proc. IEEE Custom Integrated Circuits Conf. (CICC '11), pp. 1-8, 2011.
[25] A. Moshovos, G. Memik, B. Falsafi, and A. Choudhary, "JETTY: Snoop Filtering for Reduced Energy Consumption in SMP Servers," Proc. Seventh Ann. Symp. High-Performance Computer Architecture (HPCA '01), 2001.
[26] J.F. Cantin et al., "Coarse-Grain Coherence Tracking: Regionscout and Region Coherence Arrays," IEEE Micro, vol. 26, no. 1, pp. 70-79, Jan. 2006.
[27] K. Strauss, X. Shen, and J. Torrellas, "Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors," Proc. 33rd Ann. Int'l Symp. Computer Architecture (ISCA '06), pp. 327-338, 2006.
[28] Z. Li et al., "Spectrum: A Hybrid Nanophotonic-Electric On-Chip Network," Proc. 46th Ann. Design Automation Conf., pp. 575-580, 2009.
[29] Y. Xu, Y. Du, Y. Zhang, and J. Yang, "A Composite and Scalable Cache Coherence Protocol for Large Scale CMPs," Proc. Int'l Conf. Supercomputing (ICS '11), pp. 285-294, 2011.
[30] G. Kurian, J.E. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L.C. Kimerling, and A. Agarwal, "ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network," Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 477-488, 2010.
[31] A. Gupta, W.-D. Weber, and T. Mowry, "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes," Proc. Int'l Conf. Parallel Processing, pp. 312-321, 1990.
77 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool