This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Distributed Loop Controller for Multithreading in Unithreaded ILP Architectures
March 2009 (vol. 58 no. 3)
pp. 311-321
Praveen Raghavan, IMEC vzw and KULeuven, Heverlee
Andy Lambrechts, IMEC vzw and KULeuven, Heverlee
Murali Jayapala, IMEC vzw, Heverlee
Francky Catthoor, IMEC Belgium, Leuven
Diederik Verkest, IMEC vzw, Leuven
Reduced energy consumption is one of the most important design goals for embedded application domains like wireless communication, multimedia and biomedical applications. The instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy consumption and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and supports the execution of multiple incompatible loops in parallel in a uni-processor system. We present different methods to implement the loop controller architecture, compare them, and show that distributing the instruction memory helps to reduce the interconnect cost as well. This architecture enhancement can reduce the energy consumed in the instruction memory hierarchy by 59% and improve the performance by 22% compared to hardware based enhanced SMT based architectures.

[1] A. Lambrechts, P. Raghavan, A. Leroy, G. Talavera, T. Vander Aa, M. Jayapala, F. Catthoor, D. Verkest, G. Deconinck, H. Coporaal, F. Robert, and J. Carrabina, “Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application,” Proc. 16th IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '05), pp. 179-184, July 2005.
[2] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad Memory: A Design Alternative for Cache On-Chip Memory in Embedded Systems,” Proc. 10th Int'l Symp. Hardware/Software Codesign (CODES '02), May 2002.
[3] M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, “Compiler-Directed Scratch Pad Memory Optimization for Embedded Multiprocessors,” IEEE Trans. Very Large Scale Integration (VLSI '04) Systems, pp. 281-287, Mar. 2004.
[4] S. Rixner, W.J. Dally, B. Khailany, P.R. Mattson, U.J. Kapasi, and J.D. Owens, “Register Organization for Media Processing,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture (HPCA '00), pp. 375-386, Jan. 2000.
[5] V. Lapinskii, M.F. Jacome, and G. de Veciana, “Application-Specific Clustered VLIW Datapaths: Early Exploration on a Parameterized Design Space,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 8, pp.889-903, Aug. 2002.
[6] M. Jayapala, F. Barat, T. Vander Aa, F. Catthoor, H. Corporaal, and G. Deconinck, “Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors,” IEEE Trans. Computers, vol. 54, no. 6, pp. 672-683, June 2005.
[7] A. Halambi, A. Shrivastava, P. Biswas, N. Dutt, and A. Nicolau, “An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAS,” Proc. 39th Design Automation Conf. (DAC '02), Mar. 2002.
[8] T.M. Conte, S. Banerjia, S.Y. Larin, and K.N. Menezes, “Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '96), Dec. 1996.
[9] H. DeMan, “Ambient Intelligence: Giga-Scale Dreams and Nano-Scale Realities,” Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC '05), Keynote Speech, Feb. 2005.
[10] W. Dally, “Low Power Architectures,” Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC '05), Panel Talk on When Processors Hit the Power Wall, Feb. 2005.
[11] S. Cotterell and F. Vahid, “Synthesis of Customized Loop Caches for Core-Based Embedded Systems,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '02), Nov. 2002.
[12] J.W. Sias, H.C. Hunter, and W.m.W. Hwu, “Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead Predication,” Proc. 34th Ann. Int'l Symp. Microarchitecture (MICRO '01), Dec. 2001.
[13] S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, “Assigning Program and Data Objects to Scratchpad for Energy Reduction,” Proc. Design Automation and Test in Europe Conf. and Exposition (DATE '02), Mar. 2002.
[14] Starcore DSP Tech nology, SC140 DSP Core Reference Manual, http:/www.starcore-dsp.com, June 2000.
[15] J.I. Gómez, P. Marchal, S. Verdoorlaege, L. Piñuel, and F. Catthoor, “Optimizing the Memory Bandwidth with Loop Morphing,” Proc. 15th IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '04), pp. 213-223, 2004.
[16] E. Ozer and T.M. Conte, “High-Performance and Low-Cost Dual Thread VLIW Processor Using Weld Architecture Paradigm,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 12, Dec. 2005.
[17] S. Kaxiras, G. Narlikar, A.D. Berenbaum, and Z. Hu, “Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads,” Proc. Int'l Conf. Compilers, Architecture, and Synthesis for Embedded Systems (CASES '01), pp. 211-220, Nov. 2001.
[18] D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA '95), pp. 392-403, June 1995.
[19] Texas Instruments, Inc., TMS320C64x/C64x$+$ DSP CPU and Instruction Set Reference Guide, http://focus.ti.com/docs/apps/catalog/resources appnoteabstract.jhtml?abstractName= spru732b , May 2006.
[20] F. Quillere, S. Rajopadhye, and D. Wilde, “Generation of Efficient Nested Loops from Polyhedra,” Int'l J. Parallel Programming, 2000.
[21] M. Palkovic, E. Brockmeyer, P. Vanbroekhoven, H. Corporaal, and F. Catthoor, “Systematic Preprocessing of Data Dependent Constructs for Embedded Systems,” Proc. 15th Int'l Workshop Power and Timing Modeling, Optimization and Simulation (PATMOS '05), pp. 89-98, 2005.
[22] P. Raghavan, A. Lambrechts, J. Absar, M. Jayapala, and F. Catthoor, “COFFEE: COmpiler Framework for Energy-Aware Exploration,” Proc. Third Int'l Conf. High Performance Embedded Architectures and Compilers (HiPEAC '08), Jan. 2008.
[23] Trimaran: An Infrastructure for Research in Instruction-Level Parallelism, http:/www.trimaran.org, 1999.
[24] TI DSP Benchmark Suite, http://focus.ti.com/docs/toolsw/folders/ printsprc092.html, 2008.

Index Terms:
RISC/CISC, VLIW architectures, Multithreaded processors, Support for multi-threaded execution, Low-power design, Energy-aware systems
Citation:
Praveen Raghavan, Andy Lambrechts, Murali Jayapala, Francky Catthoor, Diederik Verkest, "Distributed Loop Controller for Multithreading in Unithreaded ILP Architectures," IEEE Transactions on Computers, vol. 58, no. 3, pp. 311-321, March 2009, doi:10.1109/TC.2008.168
Usage of this product signifies your acceptance of the Terms of Use.