The Community for Technology Leaders
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
ISBN: 978-1-5090-6609-4
pp: 191-200
Jingweijia Tan , Department of Electrical Engineering and Computer Science University of Kansas Lawrence, 66045 USA
Xin Fu , Department of Electrical Engineering and Computer Science University of Kansas Lawrence, 66045 USA
ABSTRACT
With hundreds of cores integrated into a single chip, the general-purpose computing on graphic processing units (GPGPUs) provide high computing power to accelerate parallel applications. However, they are prone to manifest high soft-error vulnerability due to the lack of fault detection and tolerance. Especially, streaming processors become the reliability hot-spot in GPGPUs. This paper explores two opportunistic soft-error detection techniques to cost-effectively improve the streaming processors reliability. Observing that the streaming processors are not fully utilized during the branch divergence and pipeline stalls caused by the long latency operations, we propose to Recycle the streaming processors Idle time for Soft-Error detection (RISE) and obtain the good fault coverage with negligible performance degradation. RISE is composed of full-RISE and partial-RISE. Full-RISE selectively triggers the redundancy for a set of warps so that leverages the fully idled streaming processors during the pipeline stall time for the error detection. Partial-RISE performs the redundancy for a number of threads in certain warps using the partially idled streaming processors during the branch divergence. Our experimental results show that RISE shows strong capability in improving the SPs soft-error reliability by 43% with negligible (e.g. 4%) performance loss.
INDEX TERMS
Instruction sets, Graphics processing units, Pipelines, Redundancy, Computer architecture,Streaming multiprocessors, GPGPU, Reliability, Soft Errors
CITATION
Jingweijia Tan, Xin Fu, "RISE: Improving the streaming processors reliability against soft errors in GPGPUs", 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 191-200, 2012, doi:
97 ms
(Ver 3.3 (11022016))