The Community for Technology Leaders
RSS Icon
Issue No.11 - Nov. (2013 vol.62)
pp: 2252-2265
Bo Zhao , University of Pittsburgh, Pittsburgh
Yu Du , University of Pittsburgh, Pittsburgh
Jun Yang , University of Pittsburgh, Pittsburgh
Youtao Zhang , University of Pittsburgh, Pittsburgh
Process variations in integrated circuits have significant impact on their performance, leakage, and stability. This is particularly evident in large, regular, and dense structures such as DRAMs. DRAMs are built using minimized transistors with presumably uniform speed in an organized array structure. Process variation can introduce latency disparity among different memory arrays. With the proliferation of 3D stacking technology, DRAMs become a favorable choice for stacking on top of a multicore processor as a last level cache for large capacity, high bandwidth, and low power. Hence, variations in bank speed create a unique problem of nonuniform cache accesses in 3D space. In this paper, we investigate cache management techniques for tolerating process variation in a 3D DRAM stacked onto a multicore processor. We modeled the process variation in a four-layer DRAM memory, including cell transistor, capacitor trench, and peripheral circuit, to characterize the latency and retention time variations among different banks. As a result, the notion of fast and slow banks from the core's standpoint is no longer associated with their physical distances with the banks. They are determined by the different bank latencies due to process variation. We develop cache migration schemes that utilize fast banks while limiting the cost due to migration. Our experiments show that there is a great performance benefit in exploiting fast memory banks through migration. On average, a variation-aware management can improve the performance of a workload over the baseline (where one of the slowest bank speed is assumed for all banks) by 16.5 percent. We are also only 0.8 percent away in performance from an ideal memory where no process variation is present.
Three dimensional displays, Random access memory, Microprocessors, Transistors, Decoding, Arrays,NUCA, Process variation, 3D die stacking, DRAM
Bo Zhao, Yu Du, Jun Yang, Youtao Zhang, "Process Variation-Aware Nonuniform Cache Management in a 3D Die-Stacked Multicore Processor", IEEE Transactions on Computers, vol.62, no. 11, pp. 2252-2265, Nov. 2013, doi:10.1109/TC.2012.129
[1] A. Agarwal, B.C. Paul, H. Mahmoodi, A. Datta, and K. Roy, "A Process-Tolerant Cache Architecture for Improved Yield in Nanoscale Technologies," Trans. VLSI System, vol. 13, no. 1, pp. 27-38, 2005.
[2] A. Agarwal, B.C. Paul, S. Mukhopadhyay, and K. Roy, "Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture," J. Solid-State Circuits, vol. 40, no. 9, pp. 1804-1814, 2005.
[3] M. Agasthi, V. Venkatesan, and R. Balasubramonian, "Understanding the Impact of 3D Stacked Layouts on ILP," J. Inst.-Level Parallelism, vol. 9, pp. 1-27, 2007.
[4] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. Loh, D. McCaule, P. Morrow, D. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, "Die Stacking (3D) Microarchitecture," Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture, 2006.
[5] K.A. Bowman, S.G. Duvall, and J.D. Meindl, "Impact of Die-to-Die and within-Die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," J. Solid-State Circuits, vol. 37, no. 2, pp. 183-190, 2002.
[6] Y. Cao and L.T. Clark, "Mapping Statistical Process Variations toward Circuit Performance Variability: An Analytical Modeling Approach," Proc. 42nd Ann. Design Automation Conf., 2005.
[7] M. Chang, J. Lin, S. Shih, T. Wu, B. Huang, J. Yang, and P. Lee, "Impact of Gate-Induced Drain Leakage on Retention Time Distribution of 256 Mbit DRAM with Negative Wordline Bias," Trans. Electron Devices, vol. 50, no. 4, pp. 1036-1041, 2003.
[8] Z. Chishti, M. Powell, and T.N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2003.
[9] S. Cho and L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2006.
[10] E. Chun, Z. Chishti, and T.N. Vijaykumar, "Shapeshifter: Dynamically Changing Pipeline Width and Speed to Address Process Variations," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2008.
[11] N. Cressie, Statistics for Spatial Data. Wiley, 1993.
[12] A. Das, B. Ozisikyilmaz, S. Zademir, G. Memik, J. Zambreno, and A. Choudhary, "Evaluating the Effects of Cache Redundancy on Profit," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2008.
[13] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, "Modeling within-Die Spatial Correlation Effects for Process-Design Co-Optimization," Proc. Sixth Int'l Symp. Quality of Electronic Design, 2005.
[14] X. Fu, T. Li, and J. Fortes, "NBTI Tolerant Microarchitecture Design in the Presence of Process Variation," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2008.
[15] X. Fu, T. Li, and J. Fortes, "Soft Error Vulnerability Aware Process Variation Mitigation," Proc. Int'l Symp. High-Performance Computer Architecture, 2009.
[16] T. Hamamoto, S. Sugiura, and S. Sawada, "On the Retention Time Distribution of Dynamic Random Access Memory (DRAM)," Trans. Electron Devices, vol. 45, no. 6, pp. 1300-1309, 1998.
[17] Z. Hatab, J. McNeil, S. Sohail, and H. Naqvi, "Sixteen-Megabit Dynamic Random Access Memory Trench Depth Characterization Using Two-Dimensional Diffraction Analysis," J. Vacuum Science Technology, vol. 13, no. 2, pp. 174-182, 1995.
[18] M.B. Healy et al., "Design and Analysis of 3D-MAPS: A Many-Core 3D Processor with Stacked Memory," Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2010.
[19] S. Hebert and D. Marculescu, "Variation-Aware Dynamic Voltage/Frequency Scaling," Proc. Int'l Symp. High Performance Computer Architecture, 2009.
[20] S. Hong, S. Kim, J. Wee, and S. Lee, "Low-Voltage DRAM Sensing Scheme with Offset-Cancellation Sense Amplifier," J. Solid-State Circuits, vol. 37, no. 10, pp. 1356-1360, 2002.
[21] K. Itoh, Y. Nakagome, S. Kimura, and T. Watanabe, "Limitations and Challenges of Multigigabit DRAM Chip Design," IEEE J. Solid-State Circuits, vol. 32, no. 5, pp. 624-634, May 1997.
[22] U. Kang et al., "8Gb 3D DDR3 DRAM Using through-Silicon-via Technology," Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2009.
[23] C. Kim, D. Burger, and S.W. Keckler, "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 2002.
[24] J.-S. Kim et al., "A 1.2V 12.8GB/s 2Gb Mobile Wide-I/O DRAM with 4x128 I/Os Using TSV-Based Stacking," Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2011.
[25] K. Kim et al., "Highly Manufacturable 1Gb SDRAM," Proc. Symp. VLSI Technology, 1997.
[26] T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, and K. Flautner, "PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 2006.
[27] J.P. Kulkarni, K. Kim, S.P. Park, and K. Roy, "Process Variation Tolerant SRAM Array for Ultra Low Voltage Applications," Proc. 45th ACM/IEEE Design Automation Conf. (DAC '08), 2008.
[28] F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir, "Design and Management of 3D Chip Multiprocessors Using Network-in-Memory," Proc. 33rd Int'l Symp. Computer Architecture (ISCA '06), 2006.
[29] X. Liang and D. Brooks, "Mitigating the Impact of Process Variations on CPU Register File and Execution Units," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2006.
[30] X. Liang, R. Canal, G.-Y. Wei, and D. Brooks, "Process Variation Tolerant 3T1D-Based Cache Architectures," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2007.
[31] X. Liang, G.-Y. Wei, and D. Brooks, "ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency," Proc. 35th Int'l Symp. Computer Architecture (ISCA '08), 2008.
[32] C.C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari, "Bridging the Processor-Memory Performance Gap with 3D IC Technology," IEEE Design Test Computers, vol. 22, no. 6, pp. 556-564, Nov./Dec. 2005.
[33] G. Loh, "3D-Stacked Memory Architecture for Multi-Core Processors," Proc. Int'l Symp. Computer Architecture (ISCA), 2008.
[34] G.L. Loi, B. Agarwal, N. Srivastava, S. Lin, and T. Sherwood, "A Thermally-Aware Performance Analysis of Vertically Integrated (3D) Processor-Memory Hierarchy," Proc. 43rd Ann. Design Automation Conf., 2006.
[35] N. Lu et al., "A Substrate-Plate Trench-Capacitor (SPT) Memory Cell for Dynamic RAM's," J. Solid-State Circuits, vol. 21, no. 5, pp. 627-634, 1986.
[36] N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, and D. Newell, "Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), 2009.
[37] R.E. Maatick and S.E. Schuster, "Logic-Based eDRAM: Origins and Rationale for Use," IBM J. Research and Development, vol. 49, no. 1, pp. 145-165, 2005.
[38] T. Morshed et al., "BSIM4.6.4 MOSFET Model—User's Manual," 2009.
[39] W. Mueller et al., "Challenges for the DRAM Cell Scaling to 40nm," Proc. IEEE Int'l Electron Devices Meeting, 2005.
[40] K. Muller, B. Flietner, C. Hwang, R. Kleinhenz, T. Nakao, R. Ranade, Y. Tsunashima, and T. Mii, "Trench Storage Node Technology for Gigabit DRAM Generations," Proc. IEEE Int'l Electron Devices Meeting, 1996.
[41] C. Murthy and M. Gall, "Process Variation Effects on Circuit Performance: TCAD Simulation of 256-Mbit Technology," IEEE Trans. Computer-Aided Design Integrated Circuits Systems, vol. 16, no. 11, pp. 1383-1389, Nov. 1997.
[42] S. Nakajima, K. Miura, K. Minegishi, and T. Morie, "An Isolation-Merged Vertical Capacitor Cell for Large Capacity DRAM," Proc. IEEE Int'l Electron Devices Meeting, 1984.
[43] S.R. Nassif, "Modeling and Forecasting of Manufacturing Variations," Proc. Asia and South Pacific Design Automation Conf., 2001.
[44] S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou, "Yield-Aware Cache Architectures," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2006.
[45] K. Puttaswamy and G.H. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D Integrated Processors," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), 2007.
[46] T. Rajeevakumar, T. Lii, Z. Weinberg, G. Bronner, P. McFarland, P. Coane, K. Kwietniak, A. Megdanis, K. Stein, and S. Cohen, "Trench Storage Capacitors For High Density DRAMs," Proc. Int'l Electron Devices Meeting, 1991.
[47] K. Rao, M. Elahy, D. Bordelon, S. Banerjee, H. Tsai, W. Richardson, and R. Womackt, "Trench Capacitor Design Issues in VLSI DRAM Cells," Proc. Int'l Electron Devices Meeting, 1986.
[48] P. RibeiroJr. and P. Diggle, "geoR: A Package for Geostatistical Analysis," R-NEWS, vol. 1, no. 2, pp. 15-18, 2001.
[49] S. Sarangi, B. Greskamp, A. Tiwari, and J. Torrellas, "EVAL: Utilizing Processors with Variation-Induced Timing Errors," Proc. IEEE/ACM Int'l Symp. Microarchitecture, 2008.
[50] S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, "VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects," IEEE Trans. Semiconductor Manufacturing, vol. 21, no. 1, pp. 3-13, Feb. 2008.
[51] J. Singh, J. Mathew, D.K. Pradhan, and S.P. Mohanty, "Failure Analysis for Ultra Low Power Nano-CMOS SRAM under Process Variations," Proc. IEEE Int'l SOC Conf., 2008.
[52] G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, "A Novel Architecture of the 3D Stacked MRAM L2 Cache for CMPs," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), 2009.
[53] R. Teodorescu and J. Torrellas, "Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors," Proc. Int'l Symp. Computer Architecture (ISCA), 2008.
[54] S. Thoziyoor, J.H. Ahn, M. Monchiero, J.B. Brockman, and N.P. Jouppi, "A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies," Proc. Int'l Symp. Computer Architecture (ISCA), 2008.
[55] A. Tiwari, S.R. Sarangi, and J. Torellas, "ReCycle: Pipeline Adaptation to Tolerate Process Variation," Proc. Int'l Symp. Computer Architecture (ISCA), 2007.
[56] M. Togo, S. Iwao, H. Nobusawa, M. Hamada, K. Yoshida, N. Yasuzato, and T. Tanigawa, "A Salicide-Bridged Trench Capacitor with a ${\rm Double\hbox{-}Sacrificial\hbox{-}Si}_3 {\rm N}_4\hbox{-}{\rm Sidewall}$ (DSS) for High-Performance Logic-Embedded DRAMs," Proc. Int'l Electron Devices Meeting, 1997.
[57] X. Wu, Y. Xie, J. Li, L. Zhang, E. Speight, and R. Rajamony, "Hybrid Cache Architecture with Disparate Memory Technologies," Proc. Int'l Symp. Computer Architecture (ISCA), 2009.
[58] S. Lee, C. Choi, J. Kong, W. Lee, and J. Yoo, "An Efficient Statistical Analysis Methodology and Its Application to High-Density DRAMs," Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, 1997.
[59] W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45nm Early Design Exploration," Trans. Electron Devices, vol. 53, no. 11, pp. 2816-2823, 2006.
[60] R Development Core Team "R: A Language and Environment for Statistical Computing," R Foundation for Statistical Computing, http:/, 2006.
[61] Int'l Technology Roadmap for Semiconductors, ITRS Report, http:/, 2007.
[62] "Predictive Technology Model (PTM)," Arizona State Univ., http://www.eas.asu.eduptm/, 2013.
[63] UltraSPARC T2 Processor, /, 2013.
[64] Tezzaron Semiconductors, FaStack Memory, http://www. , 2013.
[65] Tezzaron Semiconductors, 3D Stacked DRAM, http://www. , 2013.
[66] Tezzaron Semiconductors, Bi-STAR Technology, , 2013.
[67] Virtutech Simics, http:/, 2013.
[68] The PARSEC Benchmark Suite, http:/, 2013.
461 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool