This Article 
 Bibliographic References 
 Add to: 
On Task Allocation and Scheduling for Lifetime Extension of Platform-Based MPSoC Designs
December 2011 (vol. 22 no. 12)
pp. 2088-2099
Lin Huang, The Chinese University of Hong Kong, Hong Kong
Feng Yuan, The Chinese University of Hong Kong, Hong Kong
Qiang Xu, The Chinese University of Hong Kong, Hong Kong
With the relentless scaling of semiconductor technology, the lifetime reliability of today's multiprocessor system-on-a-chip (MPSoC) designs has become one of the major concerns for the industry. Without explicitly taking this issue into consideration during the task allocation and scheduling process, existing works may lead to imbalanced aging rates among processors, thus reducing the system's service life. To tackle this problem, in this paper, we propose an analytical model to estimate the lifetime reliability of multiprocessor platforms when executing periodical tasks, and we present a novel task allocation and scheduling algorithm that is able to take the aging effects of processors into account, based on the simulated annealing technique. In addition, to speed up the annealing process, several techniques are proposed to simplify the design space exploration process with satisfactory solution quality. Experimental results on various hypothetical multiprocessors and task graphs show that significant system lifetime extension can be achieved by using the proposed approach, especially for heterogeneous platforms with large task graphs.

[1] A. Jerraya, H. Tenhunen, and W. Wolf, "Guest Editors' Introduction: Multiprocessor Systems-on-Chips," Computer, vol. 38, no. 7, pp. 36-40, July 2005.
[2] Hardware/Software Co-Design: Principles and Practice, J. Staunstrup and W. Wolf, eds. Kluwer Academic Publishers, 1997.
[3] ARM, "ARM11 PrimeXsys Platform," 02-print_arm11_primexsys_platform_ ian.pdf , 2011.
[4] B. Vermeulen, S. Oostdijk, and F. Bouwman, "Test and Debug Strategy of the PNX8525 Nexperia™ Digital Video Platform System Chip," Proc. IEEE Int'l Test Conf. (ITC), pp. 121-130, 2001.
[5] S. Borkar, "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation," IEEE Micro, vol. 25, no. 6, pp. 10-16, Nov./Dec. 2005.
[6] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers, "The Case for Lifetime Reliability-Aware Microprocessors," Proc. IEEE/ACM Int'l Symp. Computer Architecture (ISCA), pp. 276-287, 2004.
[7] C. Zhu, Z. Gu, R.P. Dick, and L. Shang, "Reliable Multiprocessor System-on-Chip Synthesis," Proc. IEEE/ACM Int'l Conf. Hardware/Software Codesign and System Synthesis, pp. 239-244, 2007.
[8] A. Dogan and F. Ozguner, "Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing," IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 3, pp. 308-323, Mar. 2002.
[9] S.M. Shatz, J.-P. Wang, and M. Goto, "Task Allocation for Maximizing Reliability of Distributed Computer Systems," IEEE Trans. Computers, vol. 41, no. 9, pp. 1156-1168, Sept. 1992.
[10] S. Srinivasan and N.K. Jha, "Safety and Reliability Driven Task Allocation in Distributed Systems," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 3, pp. 238-251, Mar. 1999.
[11] S. Tosun, N. Mansouri, E. Arvas, M. Kandemir, Y. Xie, and W.-L. Hung, "Reliability-Centric Hardware/Software Co-Design," Proc. Int'l Symp. Quality of Electronic Design (ISQED), pp. 375-380, 2005.
[12] L. Huang and Q. Xu, "On Modeling the Lifetime Reliability of Homogeneous Manycore Systems," Proc. 14th IEEE Int'l Symp. Pacific Rim Dependable Computing (PRDC), pp. 87-94, 2008.
[13] J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers, "Exploiting Structural Duplications for Lifetime Reliability Enhancement," Proc. IEEE/ACM Int'l Symp. Computer Architecture (ISCA), pp. 520-531, 2005.
[14] Y. Xie and W.-L. Hung, "Temperature-Aware Task Allocation and Scheduling for Embedded Multiprocessor Systems-on-Chip (MPSoC) Design," J. VLSI Signal Processing Systems, vol. 45, pp. 177-189, 2006.
[15] J. Shin, V. Zyuban, Z. Hu, J. Rivers, and P. Bose, "A Framework for Architecture-Level Lifetime Reliability Modeling," Proc. IEEE/IFIP Int'l Conf. Dependable Systems and Networks (DSN), pp. 534-53, 2007.
[16] M. Bushnell and V. Agrawal, Essentials of Electronic Testing. Kluwer Academic Publishers, 2000.
[17] Q. Xu and N. Nicolici, "Resource-Constrained System-on-a-Chip Test: A Survey," IEE Proc.- Computers and Digital Techniques, vol. 152, no. 1, pp. 67-81, Jan. 2005.
[18] M. Nicolaidis, "Design for Soft Error Mitigation," IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 405-418, Sept. 2005.
[19] J.R. Black, "Electromigration—A Brief Survey and Some Recent Results," IEEE Trans. Electron Devices, vol. ED-16, no. 4, pp. 338-347, Apr. 1969.
[20] "Failure Mechanisms and Models for Semiconductor Devices (jep122c)," JEDEC Publication, 2003.
[21] C.-K. Hu, R. Rosenberg, H.S. Rathore, D.B. Nguyen, and B. Agarwala, "Scaling Effect on Electromigration in On-Chip Cu Wiring," Proc. IEEE Int'l Conf. Interconnect Technology, pp. 267-269, 1999.
[22] J.H. Stathis, "Reliability Limits for the Gate Insulator in CMOS Technology," IBM J. Research and Development, vol. 46, nos. 2/3, pp. 265-283, 2002.
[23] S. Zafar, A. Kumar, E. Gusev, and E. Cartier, "Threshold Voltage Instabilities in High-$\kappa$ Gate Dielectric Stacks," IEEE Trans. Device and Materials Reliability, vol. 5, no. 1, pp. 45-64, Mar. 2005.
[24] Z. Lu, W. Huang, M.R. Stan, K. Skadron, and J. Lach, "Interconnect Lifetime Prediction for Reliability-Aware Systems," IEEE Trans. Very Large Scale Integration Systems, vol. 15, no. 2, pp. 159-172, Feb. 2007.
[25] A. Coskun, T. Rosing, K. Mihic, G.D. Micheli, and Y.L. Lebici, "Analysis and Optimization of MPSoC Reliability," J. Low Power Electronics, vol. 15, no. 2, pp. 159-172, Feb. 2006.
[26] T.D. Braun, H.J. Siegel, N. Beck, L.L. Boloni, M. Maheswaran, A.I. Reuther, J.P. Robertson, M.D. Theys, B. Yao, D. Hensgen, and R.F. Freund, "A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems," J. Parallel and Distributed Computing, vol. 61, no. 6, pp. 810-837, June 2001.
[27] Y.-K. Kwok and I. Ahmad, "Static Task Scheduling and Allocation Algorithms for Scalable Parallel and Distributed Systems: Classification and Performance Comparison," Annual Review of Scalable Computing, Y.C. Kwong, ed., pp. 107-227, Singapore Univ. Press, 2000.
[28] G. Liao, E.R. Altman, V.K. Agarwal, and G.R. Gao, "A Comparative Study of Multiprocessor List Scheduling Heuristics," Proc. Hawaii Int'l Conf. System Sciences, pp. 68-77, 1994.
[29] A.K. Coskun, T.S. Rosing, and K. Whisnant, "Temperature Aware Task Scheduling in MPSoCs," Proc. Conf. Design, Automation, and Test in Europe (DATE), pp. 1659-1664, 2007.
[30] K. Stavrou and P. Trancoso, "Thermal-Aware Scheduling: A Solution for Future Chip Multiprocessors Thermal Problems," Proc. EUROMICRO Conf. Digital System Design (DSD), pp. 123-126, 2006.
[31] S. Herbert and D. Marculescu, "Characterizing Chip-Multiprocessor Variability-Tolerance," Proc. ACM/IEEE Design Automation Conf. (DAC), pp. 313-318, 2008.
[32] S.R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, "VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects," IEEE Trans. Semiconductor Manufacturing, vol. 21, no. 1, pp. 3-13, Feb. 2008.
[33] A. Dasgupta and R. Karri, "Electromigration Reliability Enhancement via Bus Activity Distribution," Proc. ACM/IEEE Design Automation Conf. (DAC), pp. 353-356, 1996.
[34] L. Huang and Q. Xu, "Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms," Proc. Design, Automation, and Test in Europe (DATE), pp. 51-56, 2009.
[35] J. Oh and C. Wu, "Genetic-Algorithm-Based Real-Time Task Scheduling with Multiple Goals," J. Systems and Software, vol. 71, no. 3, pp. 245-258, May 2004.
[36] "Methods for Calculating Failure Rates in Units of Fits (jesd85)," JEDEC Publication, 2001.
[37] S.-C. Chang, S.-Y. Deng, and J.Y.-M. Lee, "Electrical Characteristics and Reliability Properties of Metal-Oxide-Semiconductor Field-Effect Transistors with Dy2O3 Gate Dielectric," Applied Physics Letters, vol. 89, no. 5, pp. 053504-1-053504-3, July 2006.
[38] C.E. Ebeling, An Introduction to Reliability and Maintainability Engineering. Waveland Press, 2005.
[39] R.C. Correa, A. Ferreira, and P. Rebreyend, "Scheduling Multiprocessor Tasks with Genetic Algorithms," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 8, pp. 825-837, Aug. 1999.
[40] A. Gerasoulis and T. Yang, "On the Granularity and Clustering of Directed Acyclic Task Graphs," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 6, pp. 686-701, June 1993.
[41] K. Skadron, M.R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, "Temperature-Aware Microarchitecture," Proc. IEEE/ACM Int'l Symp. Computer Architecture (ISCA), pp. 2-13, 2003.
[42] R.P. Dick, D.L. Rhodes, and W. Wolf, "TGFF: Task Graphs for Free," Proc. Int'l Conf. Hardware Software Codesign, pp. 97-101, 1998.
[43] A.K. Goel, High-Speed VLSI Interconnections, second ed. IEEE Press, 2007.

Index Terms:
Lifetime reliability, aging effect, multiprocessor system-on-a-chip, task allocation and scheduling.
Lin Huang, Feng Yuan, Qiang Xu, "On Task Allocation and Scheduling for Lifetime Extension of Platform-Based MPSoC Designs," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 12, pp. 2088-2099, Dec. 2011, doi:10.1109/TPDS.2011.132
Usage of this product signifies your acceptance of the Terms of Use.