The Community for Technology Leaders
Green Image
Issue No. 02 - July-Dec. (2015 vol. 14)
ISSN: 1556-6056
pp: 103-106
William Song , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
Saibal Mukhopadhyay , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
Sudhakar Yalamanchili , School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA
ABSTRACT
This paper presents a lifetime reliability characterization of many-core processors based on a full-system simulation of integrated microarchitecture, power, thermal, and reliability models. Under normal operating conditions, our model and analysis reveal that the mean-time-to-failure of cores on the die show normal distribution. From the processor-level perspective, the key insight is that reducing the variance of the distribution can improve lifetime reliability by avoiding early failures. Based on this understanding, we present two variance reduction techniques for proactive reliability management; i) proportional dynamic voltage-frequency scaling (DVFS) and ii) coordinated thread swapping. A major advantage of using variance reduction techniques is that the improvement of system lifetime reliability can be achieved without adding design margins or spare components.
INDEX TERMS
Program processors, Degradation, Microarchitecture, Benchmark testing, Integrated circuit reliability, Gaussian distribution
CITATION

W. Song, S. Mukhopadhyay and S. Yalamanchili, "Architectural Reliability: Lifetime Reliability Characterization and Management ofMany-Core Processors," in IEEE Computer Architecture Letters, vol. 14, no. 2, pp. 103-106, 2015.
doi:10.1109/LCA.2014.2340873
99 ms
(Ver 3.3 (11022016))