The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2009 vol.58)
pp: 311-321
Praveen Raghavan , IMEC vzw and KULeuven, Heverlee
Andy Lambrechts , IMEC vzw and KULeuven, Heverlee
Murali Jayapala , IMEC vzw, Heverlee
Francky Catthoor , IMEC Belgium, Leuven
Diederik Verkest , IMEC vzw, Leuven
ABSTRACT
Reduced energy consumption is one of the most important design goals for embedded application domains like wireless communication, multimedia and biomedical applications. The instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy consumption and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and supports the execution of multiple incompatible loops in parallel in a uni-processor system. We present different methods to implement the loop controller architecture, compare them, and show that distributing the instruction memory helps to reduce the interconnect cost as well. This architecture enhancement can reduce the energy consumed in the instruction memory hierarchy by 59% and improve the performance by 22% compared to hardware based enhanced SMT based architectures.
INDEX TERMS
RISC/CISC, VLIW architectures, Multithreaded processors, Support for multi-threaded execution, Low-power design, Energy-aware systems
CITATION
Praveen Raghavan, Andy Lambrechts, Murali Jayapala, Francky Catthoor, Diederik Verkest, "Distributed Loop Controller for Multithreading in Unithreaded ILP Architectures", IEEE Transactions on Computers, vol.58, no. 3, pp. 311-321, March 2009, doi:10.1109/TC.2008.168
REFERENCES
[1] A. Lambrechts, P. Raghavan, A. Leroy, G. Talavera, T. Vander Aa, M. Jayapala, F. Catthoor, D. Verkest, G. Deconinck, H. Coporaal, F. Robert, and J. Carrabina, “Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application,” Proc. 16th IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '05), pp. 179-184, July 2005.
[2] R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel, “Scratchpad Memory: A Design Alternative for Cache On-Chip Memory in Embedded Systems,” Proc. 10th Int'l Symp. Hardware/Software Codesign (CODES '02), May 2002.
[3] M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, “Compiler-Directed Scratch Pad Memory Optimization for Embedded Multiprocessors,” IEEE Trans. Very Large Scale Integration (VLSI '04) Systems, pp. 281-287, Mar. 2004.
[4] S. Rixner, W.J. Dally, B. Khailany, P.R. Mattson, U.J. Kapasi, and J.D. Owens, “Register Organization for Media Processing,” Proc. Sixth Int'l Symp. High-Performance Computer Architecture (HPCA '00), pp. 375-386, Jan. 2000.
[5] V. Lapinskii, M.F. Jacome, and G. de Veciana, “Application-Specific Clustered VLIW Datapaths: Early Exploration on a Parameterized Design Space,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 21, no. 8, pp.889-903, Aug. 2002.
[6] M. Jayapala, F. Barat, T. Vander Aa, F. Catthoor, H. Corporaal, and G. Deconinck, “Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors,” IEEE Trans. Computers, vol. 54, no. 6, pp. 672-683, June 2005.
[7] A. Halambi, A. Shrivastava, P. Biswas, N. Dutt, and A. Nicolau, “An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAS,” Proc. 39th Design Automation Conf. (DAC '02), Mar. 2002.
[8] T.M. Conte, S. Banerjia, S.Y. Larin, and K.N. Menezes, “Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO '96), Dec. 1996.
[9] H. DeMan, “Ambient Intelligence: Giga-Scale Dreams and Nano-Scale Realities,” Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC '05), Keynote Speech, Feb. 2005.
[10] W. Dally, “Low Power Architectures,” Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC '05), Panel Talk on When Processors Hit the Power Wall, Feb. 2005.
[11] S. Cotterell and F. Vahid, “Synthesis of Customized Loop Caches for Core-Based Embedded Systems,” Proc. IEEE/ACM Int'l Conf. Computer-Aided Design (ICCAD '02), Nov. 2002.
[12] J.W. Sias, H.C. Hunter, and W.m.W. Hwu, “Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead Predication,” Proc. 34th Ann. Int'l Symp. Microarchitecture (MICRO '01), Dec. 2001.
[13] S. Steinke, L. Wehmeyer, B.-S. Lee, and P. Marwedel, “Assigning Program and Data Objects to Scratchpad for Energy Reduction,” Proc. Design Automation and Test in Europe Conf. and Exposition (DATE '02), Mar. 2002.
[14] Starcore DSP Tech nology, SC140 DSP Core Reference Manual, http:/www.starcore-dsp.com, June 2000.
[15] J.I. Gómez, P. Marchal, S. Verdoorlaege, L. Piñuel, and F. Catthoor, “Optimizing the Memory Bandwidth with Loop Morphing,” Proc. 15th IEEE Int'l Conf. Application-Specific Systems, Architectures and Processors (ASAP '04), pp. 213-223, 2004.
[16] E. Ozer and T.M. Conte, “High-Performance and Low-Cost Dual Thread VLIW Processor Using Weld Architecture Paradigm,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 12, Dec. 2005.
[17] S. Kaxiras, G. Narlikar, A.D. Berenbaum, and Z. Hu, “Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads,” Proc. Int'l Conf. Compilers, Architecture, and Synthesis for Embedded Systems (CASES '01), pp. 211-220, Nov. 2001.
[18] D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” Proc. 22nd Ann. Int'l Symp. Computer Architecture (ISCA '95), pp. 392-403, June 1995.
[19] Texas Instruments, Inc., TMS320C64x/C64x$+$ DSP CPU and Instruction Set Reference Guide, http://focus.ti.com/docs/apps/catalog/resources appnoteabstract.jhtml?abstractName= spru732b , May 2006.
[20] F. Quillere, S. Rajopadhye, and D. Wilde, “Generation of Efficient Nested Loops from Polyhedra,” Int'l J. Parallel Programming, 2000.
[21] M. Palkovic, E. Brockmeyer, P. Vanbroekhoven, H. Corporaal, and F. Catthoor, “Systematic Preprocessing of Data Dependent Constructs for Embedded Systems,” Proc. 15th Int'l Workshop Power and Timing Modeling, Optimization and Simulation (PATMOS '05), pp. 89-98, 2005.
[22] P. Raghavan, A. Lambrechts, J. Absar, M. Jayapala, and F. Catthoor, “COFFEE: COmpiler Framework for Energy-Aware Exploration,” Proc. Third Int'l Conf. High Performance Embedded Architectures and Compilers (HiPEAC '08), Jan. 2008.
[23] Trimaran: An Infrastructure for Research in Instruction-Level Parallelism, http:/www.trimaran.org, 1999.
[24] TI DSP Benchmark Suite, http://focus.ti.com/docs/toolsw/folders/ printsprc092.html, 2008.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool