This Article 
 Bibliographic References 
 Add to: 
High-Performance Energy-Efficient Multicore Embedded Computing
April 2012 (vol. 23 no. 4)
pp. 684-700
Arslan Munir, University of Florida, Gainesville
Sanjay Ranka, University of Florida, Gainesville
Ann Gordon-Ross, University of Florida, Gainesville
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance. Embedded systems differ from traditional high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas, performance is the major benchmark for supercomputers. The increase in on-chip transistor density exacerbates power/thermal issues in embedded systems, which necessitates novel hardware/software power/thermal management techniques to meet the ever-increasing high-performance embedded computing demands in an energy-efficient manner. This paper outlines typical requirements of embedded applications and discusses state-of-the-art hardware/software high-performance energy-efficient embedded computing (HPEEC) techniques that help meeting these requirements. We also discuss modern multicore processors that leverage these HPEEC techniques to deliver high performance per watt. Finally, we present design challenges and future research directions for HPEEC system development.

[1] W. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. Harting, V. Parikh, J. Park, and D. Sheffield, "Efficient Embedded Computing," Computer, vol. 41, no. 7, pp. 27-32, July 2008.
[2] J. Balfour, "Efficient Embedded Computing," PhD thesis, EE Dept., Stanford Univ., May 2010.
[3] P. Gepner, D. Fraser, M. Kowalik, and R. Tylman, "New Multi-Core Intel Xeon Processors Help Design Energy Efficient Solution for High Performance Computing," Proc. Int'l MultiConf. Computer Science and Information Technology (IMCSIT), Oct. 2009.
[4] P. Crowley, M. Franklin, J. Buhler, and R. Chamberlain, "Impact of CMP Design on High-Performance Embedded Computing," Proc. High Performance Embedded Computing (HPEC) Workshop, Sept. 2006.
[5] Top500, "Top 500 Supercomputer Sites," http:/www.top500. org/, June 2011.
[6] Green500, "Ranking the World's Most Energy-Efficient Supercomputers," http:/, June 2011.
[7] K. Hwang, "Advanced Parallel Processing with Supercomputer Architectures," Proc. IEEE, vol. 75, no. 10, pp. 1348-1379, Oct. 1987.
[8] A. Klietz, A. Malevsky, and K. Chin-Purcell, "Mix-and-Match High Performance Computing," IEEE Potentials, vol. 13, no. 3, pp. 6-10, Aug./Sept. 1994.
[9] W. Pulleyblank, "How to Build a Supercomputer," IEEE Rev., vol. 50, no. 1, pp. 48-52, Jan. 2004.
[10] S. Bokhari and J. Saltz, "Exploring the Performance of Massively Multithreaded Architectures," Concurrency and Computation: Practice & Experience, vol. 22, no. 5, pp. 588-616, Apr. 2010.
[11] W.-c. Feng and K. Cameron, "The Green500 List: Encouraging Sustainable Supercomputing," Computer, vol. 40, no. 12, pp. 38-44, Dec. 2007.
[12] I. Ahmad and S. Ranka, Handbook of Energy-Aware And Green Computing. Taylor and Francis Group, CRC Press, 2011.
[13] D. Milojicic, "Trend Wars: Embedded Systems," IEEE Concurrency, vol. 8, no. 4, pp. 80-90, Oct.-Dec. 2000.
[14] G. Kornaros, Multi-core Embedded Systems. Taylor and Francis Group, CRC Press, 2010.
[15] C. Gonzales and H. Wang, "White Paper: Thermal Design Considerations for Embedded Applications," http://download. papers321055.pdf, June 2011.
[16] J.C. Knight, Software Challenges in Aviation Systems. Springer, 2002.
[17] R. Kumar, D. Tullsen, P. Ranganathan, N. Jouppi, and K. Farkas, "Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance," Proc. IEEE 31st Ann. Int'l Symp. Computer Architecture (ISCA), June 2004.
[18] R. Kumar, D. Tullsen, and N. Jouppi, "Core Architecture Optimization for Heterogeneous Chip Multiprocessors," Proc. ACM Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), Sept. 2006.
[19] R. Kumar, N. Jouppi, and D. Tullsen, "Conjoined-Core Chip Multiprocessing," Proc. IEEE/ACM MICRO-37, Dec. 2004.
[20] S. Keckler, K. Olukotun, and H. Hofstee, Multicore Processors and Systems. Springer, 2009.
[21] K. Puttaswamy and G. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors," Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA), Feb. 2007.
[22] P. Pande, A. Ganguly, B. Belzer, A. Nojeh, and A. Ivanov, "Novel Interconnect Infrastructures for Massive Multicore Chips—An Overview," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS), May 2008.
[23] S. Narayanan, J. Sartori, R. Kumar, and D. Jones, "Scalable Stochastic Processors," Proc. IEEE/ACM Design, Automation and Test in Europe Conf. and Exhibition (DATE), Mar. 2010.
[24] M. Hill, Transactional Memory, Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, http://www. 1, June 2010.
[25] N. Guan, M. Stigge, W. Yi, and G. Yu, "Cache-Aware Scheduling and Analysis for Multicores," Proc. Seventh ACM Int'l Conf. Embedded Software (EMSOFT), Oct. 2009.
[26] S. Fide, Architectural Optimizations in Multi-Core Processors. VDM Verlag, 2008.
[27] J. Chang and G. Sohi, "Cooperative Caching for Chip Multiprocessors," Proc. 33rd Ann. Int'l Symp. Computer Architecture (ISCA), May 2006.
[28] K. Flautner, N. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy Caches: Simple Techniques for Reducing Leakage Power," Proc. IEEE/ACM 29th Ann. Int'l Symp. Computer Architecture (ISCA), May 2002.
[29] S.-B. Lee, S.-W. Tam, I. Pefkianakis, S.L. Lu, M. Chang, C. Guo, G. Reinman, C. Peng, M. Naik, L. Zhang, and J. Cong, "A Scalable Micro Wireless Interconnect Structure for CMPs," Proc. ACM MobiCom, Sept. 2009.
[30] A. Shacham, K. Bergman, and L. Carloni, "Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors," IEEE Trans. Computers, vol. 57, no. 9, pp. 1246-1260, Sept. 2008.
[31] P. Pande, A. Ganguly, K. Chang, and C. Teuscher, "Hybrid Wireless Network on Chip: A New Paradigm in Multi-Core Design," Proc. IEEE Second Int'l Workshop Network on Chip Architectures (NoCArc), Dec. 2009.
[32] V. Kontorinis, A. Shayan, D. Tullsen, and R. Kumar, "Reducing Peak Power with a Table-Driven Adaptive Processor Core," Proc. IEEE/ACM 42nd Ann. Int'l Symp. Microarchitecture (MICRO-42), Dec. 2009.
[33] J. Donald and M. Martonosi, "Techniques for Multicore Thermal Management: Classification and New Exploration," Proc. IEEE 33rd Int'l Symp. Computer Architecture (ISCA), June 2006.
[34] R. Jayaseelan and T. Mitra, "A Hybrid Local-Global Approach for Multi-Core Thermal Management," Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD), Nov. 2009.
[35] J. Park, D. Shin, N. Chang, and M. Pedram, "Accurate Modeling and Calculation of Delay and Energy Overheads of Dynamic Voltage Scaling in Modern High-Performance Microprocessors," Proc. ACM/IEEE Int'l Symp. Low-Power Electronics and Design (ISLPED), Aug. 2010.
[36] ACPI, "Advanced Configuration and Power Interface," http:/, June 2011.
[37] J. Lee and N. Kim, "Optimizing Throughput of Power- and Thermal-Constrained Multicore Processors Using DVFS and Per-Core Power-Gating," Proc. IEEE/ACM 46th Ann. Design Automation Conf. (DAC), July 2009.
[38] Freescale, "Green Embedded Computing and the MPC8536E PowerQUICC III Processor," white_paperMPC8536EWP.pdf, 2009.
[39] R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. Cameron, "PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 5, pp. 658-671, May 2010.
[40] H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard, "Power-Aware Computing with Dynamic Knobs," MIT Technical Report MIT-CSAIL-TR-2010-027, Computer Science and Artificial Intelligence Laboratory, May 2010.
[41] W. Baek and T. Chilimbi, "Green: A Framework for Supporting Energy-Conscious Programming Using Controlled Approximation," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), June 2010.
[42] X. Zhou, J. Yang, M. Chrobak, and Y. Zhang, "Performance-Aware Thermal Management via Task Scheduling," ACM Trans. Architecture and Code Optimization, vol. 7, no. 1, pp. 1-31, Apr. 2010.
[43] A. Jacobs, A. George, and G. Cieslewski, "Reconfigurable Fault Tolerance: A Framework for Environmentally Adaptive Fault Mitigation in Space," Proc. Int'l Conf. Field Programmable Logic and Applications (FPL), Aug./Sept. 2009.
[44] J. Samson, J. Ramos, A. George, M. Patel, and R. Some, "Technology Validation: NMP ST8 Dependable Multiprocessor Project," Proc. IEEE Aerospace Conf., Mar. 2006.
[45] CHREC, "NSF Center for High-Performance Reconfigurable Computing," http:/, June 2011.
[46] J. Sloan and R. Kumar, "Towards Scalable Reliability Frameworks for Error Prone CMPs," Proc. ACM Int'l Conf. Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct. 2009.
[47] D. Poulsen and P.-C. Yew, "Data Prefetching and Data Forwarding in Shared Memory Multiprocessors," Proc. Int'l Conf. Parallel Processing (ICPP), Aug. 1994.
[48] L. Yan, W. Hu, T. Chen, and Z. Huang, "Hardware Assistant Scheduling for Synergistic Core Tasks on Embedded Heterogeneous Multi-Core System," J. Information & Computational Science, vol. 5, no. 6, pp. 2369-2373, 2008.
[49] P. Chaparro, J. Gonzalez, G. Magklis, Q. Cai, and A. Gonzalez, "Understanding the Thermal Implications of Multicore Architectures," IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 8, pp. 1055-1065, Aug. 2007.
[50] G. Suo and X.-j. Yang, "Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning," Proc. IEEE Int'l Symp. Parallel and Distributed Processing with Applications (ISPA), Aug. 2009.
[51] H. Jeon, W. Lee, and S. Chung, "Load Unbalancing Strategy for Multi-Core Embedded Processors," IEEE Trans. Computers, vol. 59, no. 10, pp. 1434-1440, Oct. 2010.
[52] TILERA, "Manycore without Boundaries: TILEPro64 Processor," TILEPRO64, June 2011.
[53] TILERA, "Manycore without Boundaries: TILE-Gx Processor Family," TILE-Gx_Family, June 2011.
[54] Intel, "High-Performance Energy-Efficient Processors for Embedded Market Segments," embedded/ downloads315336.pdf, June 2011.
[55] Intel, "Intel Core 2 Duo Processor Maximizing Dual-Core Performance Efficiency," core2duomobile_prod_brief.pdf, June 2011.
[56] Intel, "Dual-Core Intel Xeon Processors LV and ULV for Embedded Computing," intarch/ prodbref31578602.pdf, June 2011.
[57] Intel, "Intel Xeon Processor LV 5148," , June 2011.
[58] NVIDIA, "NVIDIA Tesla C1060 Computing Processor," , June 2011.
[59] NVIDIA, "NVIDIA Tesla Personal Supercomputer," Mar09_LowRes.pdf , June 2011.
[60] NVIDIA, "NVIDIA PowerMizer Technology," http://www. , June 2011.
[61] NVIDIA, "NVIDIA Tesla C2050/C2070 GPU Computing Processor," C2070_us.html , June 2011.
[62] T. Berg, "Maintaining I/O Data Coherence in Embedded Multicore Systems," IEEE MICRO, vol. 29, no. 3, pp. 10-19, May/June 2009.
[63] G. Bournoutian and A. Orailoglu, "Miss Reduction in Embedded Processors through Dynamic, Power-Friendly Cache Design," Proc. IEEE/ACM 45th Ann. Design Automation Conf. (DAC), June 2008.
[64] SeaMicro, "The SM10000 Family," http:/, June 2011.
[65] AMAX, "High Performance Computing: ClusterMax SuperG Tesla GPGPU HPC Solutions," , June 2011.
[66] P. Koka, M. McCracken, H. Schwetman, X. Zheng, R. Ho, and A. Krishnamoorthy, "Silicon-Photonic Network Architectures for Scalable, Power-Efficient Multi-Chip Systems," Proc. ACM/IEEE 37th Ann. Int'l Symp. Computer Architecture (ISCA), June 2010.
[67] M. Asghari and A. Krishnamoorthy, "Silicon Photonics: Energy-Efficient Communication," Nature Photonics, vol. 5, pp. 268-270, May 2011.
[68] C.-W. Lee, S.-R.-N. Yun, C.-G. Yu, J.-T. Park, and J.-P. Colinge, "Device Design Guidelines for Nano-Scale MuGFETs," Elsevier Solid-State Electronics, vol. 51, no. 3, pp. 505-510, Mar. 2007.
[69] S. Collange, D. Defour, and A. Tisserand, "Power Consumption of GPUs from a Software Perspective," Proc. ACM Ninth Int'l Conf. Computational Science (ICCS), May 2009.

Index Terms:
High-performance computing (HPC), multicore, energy-efficient computing, green computing, low power, embedded systems.
Arslan Munir, Sanjay Ranka, Ann Gordon-Ross, "High-Performance Energy-Efficient Multicore Embedded Computing," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 4, pp. 684-700, April 2012, doi:10.1109/TPDS.2011.214
Usage of this product signifies your acceptance of the Terms of Use.