The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.25)
pp: 254-263
Reiley Jeyapaul , Arizona State University, Tempe
Fei Hong , Arizona State University, Tempe
Abhishek Rhisheekesan , Arizona State University, Tempe
Aviral Shrivastava , Arizona State University, Tempe
Kyoungwoo Lee , Yonsei University, Seoul
ABSTRACT
Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are only consequences of the inevitable advancement in processor technology, the industry has been forced to improve reliability on general purpose chip multiprocessors (CMPs). With the availability of increased hardware resources, redundancy-based techniques are the most promising methods to eradicate soft-error failures in CMP systems. In this work, we propose a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware-based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error-free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of our architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability tradeoffs in many-core systems. We provide a redundancy-based soft-error resilient CMP architecture for both write-through and write-back cache configurations. We design a detailed RTL model of our UnSync architecture and perform hardware synthesis to compare the hardware (power/area) overheads incurred. We compare the same with those of the Reunion technique, a state-of-the-art redundant multicore architecture. We also perform cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, our UnSync architecture reduces power consumption by 34.5 percent and improves performance by up to 20 percent with 13.3 percent less area overhead, when compared to the Reunion architecture for the same level of reliability achieved.
INDEX TERMS
Hardware, Redundancy, Multicore processing, Instruction sets,power efficiency, Multicore architecture, soft error, CMP, reliability
CITATION
Reiley Jeyapaul, Fei Hong, Abhishek Rhisheekesan, Aviral Shrivastava, Kyoungwoo Lee, "UnSync-CMP: Multicore CMP Architecture for Energy-Efficient Soft-Error Reliability", IEEE Transactions on Parallel & Distributed Systems, vol.25, no. 1, pp. 254-263, Jan. 2014, doi:10.1109/TPDS.2013.14
REFERENCES
[1] S. Kosonocky, V. Stojanovic, K.V. Berkel, M. Chao, T. Knoll, and J. Friedrich, "Power/Performance Optimization of Many-Core Processor SoCs," Proc. IEEE Int'l Solid-State Circuits Conf. Digest of Technical Papers (ISSCC), pp. 508-509, Feb. 2012.
[2] C. Slayman, "Alpha Particle or Neutron SER-What Will Dominate in Future IC Technology?" ewh.ieee.org/soc/cpmt/presentationscpmt0910e.pdf , 2010.
[3] E. Ibe, H. Taniguchi, Y. Yahagi, K.-i. Shimbo, and T. Toba, "Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule," IEEE Trans. Electron Devices, vol. 57, no. 7, pp. 1527-1538, July 2010.
[4] D. Lyons, "Sun Screen: Soft Error Issue in Sun Enterprise Servers," www.members.forbes.com/global/2000/11130323026a.html , Nov. 2000.
[5] S. Kayali, "Reliability Considerations for Advanced Microelectronics," Proc. IEEE Pacific Rim Int'l Symp. Dependable Computing (PRDC), p. 99, 2000.
[6] R. Vadlamani, J. Zhao, W. Burleson, and R. Tessier, "Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy," Proc. Conf. Design, Automation and Test in Europe (DATE), 2010.
[7] D.D. Thaker, F. Impens, I.L. Chuang, R. Amirtharajah, and F.T. Chong, "On Using Recursive TMR as a Soft Error Mitigation Technique," citeseerx.ist.psu.edu/viewdocsummary?doi= 10.1.1.131.523 , 2008.
[8] C.C. Corporation, "Data Integrity for Compaq Non-Stop Himalaya Servers," nonstop.compaq.com, 1999.
[9] J.C. Smolens, B.T. Gold, B. Falsafi, and J.C. Hoe, "Reunion: Complexity-Effective Multicore Redundancy," Proc. IEEE/ACM 39th Ann. Int'l Symp. Microarchitecture (MICRO), 2006.
[10] N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt, "The M5 Simulator: Modeling Networked Systems," IEEE Micro, vol. 26, no. 4, pp. 52-60, July/Aug. 2006.
[11] J. Hennessy, N. Jouppi, S. Przybylski, C. Rowen, T. Gross, F. Baskett, and J. Gill, "MIPS: A Microprocessor Architecture," Proc. 15th Ann. Workshop Microprogramming (MICRO), 1982.
[12] C. Inc., "User Manuals for Cadence Encounter Tool Set Version 09.10-p104," cadence.com/products/ldrtl_compiler, 2009.
[13] N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," www.hpl.hp.com/ techreports/2009HPL-2009-85.html , 2009.
[14] R. Jeyapaul, F. Hong, A. Rhisheekesan, A. Shrivastava, and K. Lee, "UnSync: A Soft Error Resilient Redundant Multicore Architecture," Proc. Int'l Conf. Parallel Processing (ICPP '11), pp. 632-641, http://dx.doi.org/10.1109ICPP.2011.76, 2011.
[15] T.J. Slegel et al., "IBM's S/390 G5 Microprocessor Design," IEEE Micro, vol. 19, no. 2, pp. 12-23, Mar. 1999.
[16] P. Meaney, S. Swaney, P. Sanda, and L. Spainhower, "IBM z990 Soft Error Detection and Recovery," IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 419-427, Sept. 2005.
[17] K. Meng, F. Huebbers, R. Joseph, and Y. Ismail, "Modeling and Characterizing Power Variability in Multicore Architectures," Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS), 2007.
[18] N. Aggarwal, P. Ranganathan, N.P. Jouppi, and J.E. Smith, "Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors," Proc. 34th Ann. Int'l Symp. Computer Architecture (ISCA), 2007.
[19] J.C. Smolens, B.T. Gold, J. Kim, B. Falsafi, J.C. Hoe, and A.G. Nowatzyk, "Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth," Proc. ASPLOS-XI, 2004.
[20] M. Gomaa, C. Scarbrough, T. Vijaykumar, and I. Pomeranz, "Transient-Fault Recovery for Chip Multiprocessors," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 98-109, June 2003.
[21] S.K. Reinhardt and S.S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading," Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 25-36, 2000.
[22] S.S. Mukherjee, M. Kontz, and S.K. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA), 2002.
[23] S. Gupta, S. Feng, A. Ansari, B. Jason, and S. Mahlke, "StageNetSlice: A Reconfigurable Microarchitecture Building Block for Resilient CMP Systems," Proc. Int'l Conf. Compilers, Architectures, and Synthesis for Embedded Systems (CASES), pp. 1-10, 2008.
[24] A. Golander, S. Weiss, and R. Ronen, "DDMR: Dynamic and Scalable Dual Modular Redundancy with Short Validation Intervals," Computer Architecture Letters, vol. 7, no. 2, pp. 65-68, July 2008.
[25] R. Phelan, "Addressing Soft Errors in ARM Core-based Designs," technical report, ARM, 2003.
[26] A.A. Nair, L.K. John, and L. Eeckhout, "AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors," Proc. IEEE/ACM Int'l Symp. Microarchitecture (Micro), pp. 125-136, 2010.
[27] A. Nieuwland, S. Jasarevic, and G. Jerin, "Combinational Logic Soft Error Analysis and Protection," Proc. IEEE 12th Int'l Symp. On-Line Testing (IOLTS), pp. 99-104, 2006.
[28] L.S. Corp, "ECC Module Reference Design," www.latticesemi. com/products/intellectualproperty/ referencedesigns eccmodule.cfm , 2005.
45 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool