Issue No. 01 - January (2011 vol. 60)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2010.205
Shantanu Gupta , University of Michigan, Ann Arbor
Shuguang Feng , University of Michigan, Ann Arbor
Amin Ansari , University of Michigan, Ann Arbor
Scott Mahlke , University of Michigan, Ann Arbor
CMOS scaling has long been a source of dramatic performance gains. However, semiconductor feature size reduction has resulted in increasing levels of operating temperatures and current densities. Given that most wearout mechanisms are highly dependent on these parameters, significantly higher failure rates are projected for future technology generations. Consequently, fault tolerance, which has traditionally been a subject of interest for high-end server markets, is now getting emphasis in the mainstream computing systems space. The popular solution for this has been the use of redundancy at a coarse granularity, such as dual/triple modular redundancy. In this work, we challenge the practice of coarse-granularity redundancy by identifying its inability to scale to high failure rate scenarios and investigating the advantages of finer-grained configurations. To this end, this paper presents and evaluates a highly reconfigurable CMP architecture, named as StageNet (SN), that is designed with reliability as its first-class design criteria. SN relies on a reconfigurable network of replicated processor pipeline stages to maximize the useful lifetime of a chip, gracefully degrading performance toward the end of life. Our results show that the proposed SN architecture can perform 40 percent more cumulative work compared to a traditional CMP over 12 years of its lifetime.
Reliability, fault tolerance, multicore, CMP, wearout.
S. Gupta, S. Feng, S. Mahlke and A. Ansari, "StageNet: A Reconfigurable Fabric for Constructing Dependable CMPs," in IEEE Transactions on Computers, vol. 60, no. , pp. 5-19, 2010.