The Community for Technology Leaders
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Toronto, ON, Canada
Oct. 25, 2008 to Oct. 29, 2008
ISBN: 978-1-5090-3021-7
pp: 52-61
Henry Wong , Dept. of Electrical and Computer Engineering, University of British Columbia, Canada
Anne Bracy , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Ethan Schuchman , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Tor M. Aamodt , Dept. of Electrical and Computer Engineering, University of British Columbia, Canada
Jamison D. Collins , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Perry H. Wang , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Gautham Chinya , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Ankur Khandelwal Groen , Digital Enterprise Group, Intel Corporation, USA
Hong Jiang , Graphics Architecture, Mobility Groups, Intel Corporation, USA
Hong Wang , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
ABSTRACT
Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8×.
INDEX TERMS
Instruction sets, Multicore processing, Graphics processing units, Graphics, Hardware
CITATION
Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, Jamison D. Collins, Perry H. Wang, Gautham Chinya, Ankur Khandelwal Groen, Hong Jiang, Hong Wang, "Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor", 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 52-61, 2008, doi:
176 ms
(Ver 3.3 (11022016))