The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.32)
pp: 4-16
Manish Arora , University of California, San Diego
Siddhartha Nath , University of California, San Diego
Subhra Mazumdar , University of California, San Diego
Scott B. Baden , University of California, San Diego
Dean M. Tullsen , University of California, San Diego
ABSTRACT
In an integrated CPU-GPU system, the CPU executes code that is profoundly different than in past CPU-only environments. This new code's characteristics should drive future CPU design and architecture. Post-GPU code has lower instruction-level parallelism, more difficult branch prediction, and loads and stores that are significantly harder to predict. Post-GPU code exhibits much smaller gains from the availability of multiple cores, owing to reduced thread-level parallelism.
INDEX TERMS
Graphics processing unit, Benchmark testing, Central Processing Unit, Parallel processing, Hidden Markov models, Computational modeling, CPU architecture, CPU-GPU systems, heterogeneous designs
CITATION
Manish Arora, Siddhartha Nath, Subhra Mazumdar, Scott B. Baden, Dean M. Tullsen, "Redefining the Role of the CPU in the Era of CPU-GPU Integration", IEEE Micro, vol.32, no. 6, pp. 4-16, Nov.-Dec. 2012, doi:10.1109/MM.2012.57
REFERENCES
1. "NVIDIA's Next Generation CUDA Compute Architecture: Fermi," Nvidia, 2009.
2. K. Asanovic et al., The Landscape of Parallel Computing Research: A View from Berkeley, tech. report, EECS Dept., Univ. of California, Berkeley, 2006.
3. V.W. Lee et al., "Debunking the 100x GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," Proc. 37th Ann. Int'l Symp. Computer Architecture (ISCA 10), ACM, 2010, pp. 451-460.
4. S. Che et al., "Rodinia: A Benchmark Suite for Heterogeneous Computing," Proc. IEEE Int'l Symp. Workload Characterization (IISWC 09), IEEE CS, 2009, pp. 44-54.
5. R. Kumar, D.M. Tullsen, and N.P. Jouppi, "Core Architecture Optimization for Heterogeneous Chip Multiprocessors," Proc. 15th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT 06), ACM, 2006, pp. 23-32.
6. J.D. Owens et al., "A Survey of General-Purpose Computation on Graphics Hardware," Computer Graphics Forum, 2007, vol. 26, no. 1, pp. 80-113.
7. A. Munshi et al., OpenCL Programming Guide, Addison-Wesley, 2011.
8. W.M. Hwu et al., "Performance Insights on Executing Nongraphics Applications on CUDA on the NVIDIA GeForce 8800 GTX," Hot Chips 19, 2007, http://www.hotchips.org/archiveshc19.
9. S.C. Harish et al., "Scope for Performance Enhancement of CMU Sphinx by Parallelizing with OpenCL," J. Wisdom Based Computing, Aug. 2011, pp. 43-46.
10. M.A. Goodrum et al., "Parallelization of Particle Filter Algorithms," Proc. Int'l Conf. Computer Architecture, Springer-Verlag, 2010, pp. 139-149.
11. C. Kolb and M. Pharr, "Options Pricing on the GPU," GPU Gems 2, M. Pharr, and R. Fernando eds., Addison-Wesley, 2005, chapter 45.
12. G. Wang et al., , "Program Optimization of Array-Intensive SPEC2K Benchmarks on Multithreaded GPU Using CUDA and Brook+," Proc. 15th Int'l Conf. Parallel and Distributed Systems, IEEE CS, 2009, pp. 292-299.
13. G. Shi, S. Gottlieb, and V. Kindratenko, MILC on GPUs, tech. report, NCSA, Univ. Illinois, Jan. 2010.
14. J. Walters et al., "Evaluating the Use of GPUs in Liver Image Segmentation and HMMER Database Searches," Proc. IEEE Int'l Symp. Parallel & Distributed Processing, IEEE CS, 2009, doi:10.1109/IPDPS.2009.5161073.
15. G. Ruetsch and M. Fatica, "A CUDA Fortran Implementation of BWAVES," http://www.pgroup.com/lit/articlesnvidia_paper_bwaves.pdf .
16. E. Gutierrez et al., "Simulation of Quantum Gates on a Novel GPU Architecture," Proc. 7th Int'l Conf. Systems Theory and Scientific Computation, WSEAS, 2007, pp. 121-126.
17. L. Solano-Quinde et al., "Unstructured Grid Applications on GPU: Performance Analysis and Improvement," Proc. 4th Workshop General Purpose Processing on Graphics Processing Units, ACM, 2011, doi:10.1145/1964179.1964197.
18. J. Stratton, "LBM on GPU," http://impact.crhc.illinois.eduparboil.aspx .
19. L.G. Szafaryn, K. Skadron, and J.J. Saucerman, "Experiences Accelerating Matlab Systems Biology Applications," Workshop Biomedicine in Computing: Systems, Architectures, and Circuits, 2009.
20. M. Sinclair, H. Duwe, and K. Sankaralingam, Porting CMP Benchmarks to GPUs, tech. report 1693, Computer Sciences Dept., Univ. of Wisconsin, Madison, June 2011.
21. K. Hoste and L. Eeckhout, "Microarchitecture-Independent Workload Characterization," IEEE Micro, May/June 2007, pp. 63-72.
22. A. Cristal et al., "Toward Kilo-Instruction Processors," ACM Trans. Architecture and Code Optimization, Dec. 2004, pp. 389-417.
23. A. Seznec, "The L-TAGE Branch Predictor," J. Instruction-Level Parallelism, May 2007; http://www.jilp.org/vol9v9paper6.pdf.
24. D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA 97), ACM, 1997, pp. 252-263.
25. J. Collins et al., "Pointer Cache Assisted Prefetching," Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture, IEEE CS, 2002, pp. 62-73.
26. R. Cooksey, S. Jourdan, and D. Grunwald, "A Stateless, Content Directed Data Prefetching Mechanism," Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 2002, pp. 279-290.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool