This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Orchestrating Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency
September 2009 (vol. 58 no. 9)
pp. 1211-1220
Hai Lin, University of Connecticut, Storrs
Yunsi Fei, University of Connecticut, Storrs
Both performance and energy efficiency are critical concerns for embedded systems and portable devices. Multi-issue processors can exploit the instruction-level parallelism (ILP) of programs to improve the performance greatly, however, most of the time at a cost of energy and power consumption. How to reduce the energy consumption while maintaining the high performance of programs running on multi-issue processors remains a challenging problem. In this paper, we propose a novel approach to apply the instruction register file (IRF) technique from single-issue processor to VLIW architecture. Frequently executed instructions are selected to be placed in the on-chip IRF for fast and energy-efficient access in program execution. Violation of synchronization among VLIW instruction slots is avoided by introducing new instruction formats and microarchitectural support. The enhanced VLIW architecture is, thus, able to orchestrate the horizontal instruction parallelism and vertical instruction packing for programs to improve system overall efficiency. Our experimental results show that the proposed processor architecture achieves both the performance advantage provided by the VLIW architecture and high energy efficiency provided by the IRF-based instruction packing technique, e.g., the fetch energy consumption is reduced by 33.4 percent for a 4-way VLIW architecture with 16-entry IRFs for SPEC2000 testbenches.

[1] M. Johnson , Superscalar Microprocessor Design. Prentice Hall, 1991.
[2] T. Kato , T. Ono , and N. Bagherzadeh , “Performance Analysis and Design Methodology for a Scalable Superscalar Architecture,” Proc. Int'l Symp. Microarchitecture, pp. 246-255, Nov. 1992.
[3] Philips, Inc., An Introduction to Very-Long Instruction Word (VLIW) Computer Architecture, Philips Semiconductors, 1997.
[4] J. Kang , J.-W. Ahn , J. Cho , K.I. Kum , and W. Sung , “A 2 Way VLIW Processor Architecture for Embedded Multimedia Applications,” Proc. IEEE Workshop Signal Processing Systems, pp. 211-220, Oct. 1999.
[5] E. Gibert , J. Sanchez , and A. Gonzalez , “Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor,” Proc. Int'l Symp. Microarchitecture, pp. 123-133, Nov. 2002.
[6] Y. Qian , S. Carr , and P. Sweany , “Optimizing Loop Performance for Clustered VLIW Architectures,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 271-280, Sept. 2002.
[7] Y.-T. Hwang and Y.-C. Chuang , “High Performance Code Generation for VLIW Digital Signal Processors,” Proc. IEEE Workshop Signal Processing Systems, pp. 683-692, Oct. 2000.
[8] J.-W. van de Waerdt , S. Vassiliadis , S. Das , S. Mirolo , C. Yen , B. Zhong , C. Basto , J.-P. van Itegem , D. Amirtharaj , K. Kalra , P. Rodriguez , and H. van Antwerpen , “The TM3270 Media-Processor,” Proc. Int'l Symp. Microarchitecture, pp. 12-24, Nov. 2005.
[9] D.A. Antonelli , A.J. Smith , and J.-W. van de Waerdt , “Power Consumption in a Real, Commercial Multimedia Core,” master's thesis, EECS Dept., Univ. of California, Berkeley, http://www.eecs.berkeley.edu/Pubs/TechRpts/ 2008EECS-2008-24.html, Mar. 2008.
[10] S. Hines , J. Green , G. Tyson , and D. Whalley , “Improving Program Efficiency by Packing Instructions into Registers,” Proc. Int'l Symp. Computer Architecture, pp. 260-271, May 2005.
[11] H. Sasaki , M. Kondo , and H. Nakamura , “Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 43-48, Oct. 2006.
[12] J. Sharkey , D. Ponomarev , K. Ghose , and O. Ergin , “Instruction Packing: Reducing Power and Delay of the Dynamic Scheduling Logic,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 30-35, Aug. 2005.
[13] G. Ascia , V. Catania , M. Palesi , and D. Patti , “System-Level Framework for Evaluating Area/Performance/Power Trade-Offs of VLIW-Based Embedded Systems,” Proc. Asia and South-Pacific Design Automation Conf., pp. 940-943, Jan. 2005.
[14] H.S. Kim , N. Vijaykrishnan , M. Kandemir , and M.J. Irwin , “A Framework for Energy Estimation of VLIW Architecture,” Proc. Int'l Conf. Computer Design, pp. 40-46, Sept. 2001.
[15] A. Macii , E. Macii , F. Crudo , and R. Zafalon , “A New Algorithm for Energy-Driven Data Compression in VLIW Embedded Processors,” Proc. Design Automation and Test Europe Conf., pp.10024-10030, Oct. 2003.
[16] S. Hines , G. Tyson , and D. Whalley , “Improving the Energy and Execution Efficiency of a Small Instruction Cache by Using an Instruction Register File,” Proc. Watson Conf. Interaction between Architecture, Circuits, and Compilers, pp. 160-169, Sept. 2005.
[17] L. Petit and J.-D. Legat , “A New Processor Architecture Exploiting ILP with a Reduced Instruction Word,” Proc. IEE Colloquium High Performance Architectures for Real-Time Image Processing, pp. 2/1-2/5, Feb. 1998.
[18] T.M. Conte , S. Banerjia , S.Y. Larin , K.N. Menezes , and S.W. Sathaye , “Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings,” Proc. Int'l Symp. Microarchitecture, pp. 201-211, Dec. 1996.
[19] S. Haga , Y. Zhang , A. Webber , and R. Barua , “Reducing Code Size in VLIW Instruction Scheduling,” J. Embedded Computing, vol. 1, no. 3, pp. 415-433, Aug. 2005.
[20] Trimaran—An Infrastructure for Research in Backend Compilation and Architecture Exploration, http:/www.trimaran.org/, 2009.
[21] P. Shivakumar and N.P. Jouppi , “Cacti 3.0: An Integrated Cache Timing, Power, and Area Model,” Technical Report WRL-2001-2, HP Lab, 2001.
[22] “The Simplescalar-Arm Power Modeling Project,” http://www.eecs.umich.edu~panalyzer/, 2009.
[23] J. Henning , “SPEC CPU2000: Measuring CPU Performance in the New Millennium,” Computer, vol. 33, no. 7, pp. 28-35, July 2002.

Index Terms:
Microprocessors, VLIW architecture, instruction register file, energy efficiency, ILP.
Citation:
Hai Lin, Yunsi Fei, "Orchestrating Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency," IEEE Transactions on Computers, vol. 58, no. 9, pp. 1211-1220, Sept. 2009, doi:10.1109/TC.2009.41
Usage of this product signifies your acceptance of the Terms of Use.