The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2010 vol.30)
pp: 25-35
Christopher Hughes , Intel, Santa Clara
Changkyu Kim , Intel, Santa Clara
Yen-Kuang Chen , Intel Corporation , Santa Clara
ABSTRACT
<p>Processors that target throughput computing often have many cores, which stresses the cache hierarchy. Logically centralized, shared data storage is needed for many-core chips to provide high cache throughput for heavily read-write shared lines. Techniques to reduce on-die and off-die traffic have a dramatic energy benefit for many-core chips.</p>
INDEX TERMS
multicore/single-chip multiprocessors, memory hierarchy, graphics processors, throughput computing
CITATION
Christopher Hughes, Changkyu Kim, Yen-Kuang Chen, "Performance and Energy Implications of Many-Core Caches for Throughput Computing", IEEE Micro, vol.30, no. 6, pp. 25-35, November/December 2010, doi:10.1109/MM.2010.83
REFERENCES
1. W.J. Dally, "The End of Denial Architecture and the Rise of Throughput Computing," keynote, Design Automation Conf., 2010; http://videos.dac.com/46th/wedkeydally.html .
2. NVIDIA, "NVIDIA's Next Generation CUDA Compute Architecture: Fermi," white paper, 2009; http://www.nvidia.com/content/PDF/fermi_white_papers NVIDIA_Fermi_Compute_ Architecture_Whitepaper.pdf .
3. Intel News Release, "Intel Unveils New Product Plans for High-Performance Computing," 2010; http://www.intel.com/pressroom/archive/releases 20100531comp.htm.
4. V.W. Lee et al., "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," Proc. Ann. Int'l Symp. Computer Architecture (ISCA 10), ACM Press, 2010, pp. 451-460.
5. J. Chang and G. Sohi, "Cooperative Caching for Chip Multiprocessors," Proc. Ann. Int'l Symp. Computer Architecture (ISCA 06), IEEE Press, 2006, pp. 264-276.
6. J. Huh et al., "A NUCA Substrate for Flexible CMP Cache Sharing," IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 8, Aug. 2007, pp. 1028-1040.
7. M. Zhang and K. Asanovic, Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches, tech. report MIT-CSAIL-TR-2005-064, Computer Science and Artificial Intelligence Laboratory, Mass. Inst. of Technology, 2005.
8. C. Bienia et al., "The Parsec Benchmark Suite: Characterization and Architectural Implications," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, ACM Press, 2008, pp. 72-81.
9. Y.K. Chen et al., "Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications, Proc. IEEE, vol. 96, no. 5, 2008, pp. 790-807.
10. S.C. Woo, "The Splash-2 Programs: Characterization and Methodological Considerations," Proc. Ann. Int'l Symp. Computer Architecture (ISCA 95), ACM Press, 1995, pp. 24-36.
11. C. Kim, D. Burger, and S.W. Keckler, "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 02), ACM Press, 2002, pp. 211-222.
12. J. Emer et al., "Asim: A Performance Model Framework," Computer, vol. 35, no. 2, Feb. 2002, pp. 68-76.
13. D. Tarjan, S. Thoziyoor, and N.P. Jouppi, CACTI 4.0: An Integrated Cache Access Time, Cycle Time, Area, Leakage, and Dynamic Power Model, tech. report HPL-2006-86, HP Labs, 2006.
14. S. Borkar, "Hundreds of Cores: Scaling to Tera-scale Architecture," Intel Developer Forum, Sept. 2006.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool