The Community for Technology Leaders
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2008)
Toronto, ON, Canada
Oct. 25, 2008 to Oct. 29, 2008
ISBN: 978-1-5090-3021-7
pp: 52-61
Henry Wong , Dept. of Electrical and Computer Engineering, University of British Columbia, Canada
Anne Bracy , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Ethan Schuchman , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Tor M. Aamodt , Dept. of Electrical and Computer Engineering, University of British Columbia, Canada
Jamison D. Collins , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Perry H. Wang , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Gautham Chinya , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
Ankur Khandelwal Groen , Digital Enterprise Group, Intel Corporation, USA
Hong Jiang , Graphics Architecture, Mobility Groups, Intel Corporation, USA
Hong Wang , Microarchitecture Research Lab, Microprocessor Technology Labs, Intel Corporation, USA
ABSTRACT
Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8×.
INDEX TERMS
Instruction sets, Multicore processing, Graphics processing units, Graphics, Hardware
CITATION

H. Wong et al., "Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor," 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), Toronto, ON, Canada, 2008, pp. 52-61.
doi:
160 ms
(Ver 3.3 (11022016))