This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Low-Power High-Performance Reconfigurable Computing Cache Architectures
October 2004 (vol. 53 no. 10)
pp. 1274-1290
The demand for higher computing power and, thus, more on-chip computing resources is ever increasing. The size of on--chip cache memory has also been consistently increasing to keep up with developments in implementation technology. However, some applications may not utilize full cache capacity and, on the contrary, require more computing resources. To efficiently utilize silicon real-estate on the chip, we exploit the possibility of using a part of cache memory for computational purposes to strike a balance in the usage of memory and computing resources for various applications. In an earlier part of our work, the idea of Adaptive Balanced Computing (ABC) architecture was evolved, where a module of an L1 data cache is used as a coprocessor controlled by main processor. A part of an L1 data cache is designed as a Reconfigurable Functional Cache (RFC) that can be configured to perform a selective core function in the media application whenever such computing capability is required. ABC architecture provides speedups ranging from 1.04x to 5.0x for various media applications. In this paper, we show that a reduced number of cache accesses and lesser utilization of other on-chip resources, due to a significant reduction in execution time of application, will result in power savings. For this purpose, the paper first develops a model to compute the power consumed by the RFC while accelerating the computation of multimedia applications. The results show that up to a 60 percent reduction in power consumption is achieved for MPEG decoding and a reduction in the range of 10 to 20 percent for various other multimedia applications. Besides, beyond the discussions in earlier work on ABC architecture, this paper presents a detailed circuit level implementation of the core functions in the RFC modules. Further, in this paper, we go much further and study the impact of converting the conventional cache into RFC on both access time and energy consumption. The analysis is performed on a wide spectrum of cache organizations with size varying from 8KB to 256KB for varying set associativity.

[1] A. DeHon, The Density Advantage of Configurable Computing Computer, vol. 33, no. 4, pp. 41-49, Apr. 2000.
[2] W. Kautz, Cellular Logic-in-Memory Arrays IEEE Trans. Computers, vol. 18, no. 8, pp. 719-727, Aug. 1969.
[3] H.S. Stone, A Logic-in-Memory Computer IEEE Trans. Computers, vol. 19, no. 1, pp. 73-78, Jan. 1970.
[4] H. Kim, Towards Adaptive Balanced Computing (ABC) Using Reconfigurable Functional Caches (RFCs) PhD dissertation, Dept. of Electrical and Computer Eng., Iowa State Univ., July 2001. http://ecpe.ee.iastate.edu/dcnl/dissertation Huesung Kim.pdf.
[5] H. Kim, A.K. Somani, and A. Tyagi, A Reconfigurable Multifunction Computing Cache Architecture IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 9, no. 4, pp. 509-523, Aug. 2001.
[6] T. Fahringer and E. Mehofer, “Buffer-Safe and Cost-Driven Communication Optimization,” J. Parallel and Distributed Computing, vol. 57, pp. 33-63, 1999.
[7] Z.A. Ye, A. Moshovos, S. Hauck, and P. Banerjee, CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit Proc. 27th Int'l Symp. Computer Architecture, pp. 225-235, 2000.
[8] R. Razdan and M.D. Smith, "A High-Performance Microarchitecture with Hardware-Programmable Functional Units," Proc. Micro-27, IEEE Computer Society, 1994, pp. 172-180.
[9] T.J. Callahan, J.R. Hauser, and J. Wawrzynek, “The Garp Architecture and C Compiler,” Computer, vol. 33, no. 44, pp. 62–69, Apr. 2000.
[10] T. Wada, S. Rajan, and S.A. Przybylski, An Analytical Access Time Model for On-Chip Cache Memories IEEE J. Solid-State Circuits, vol. 27, no. 8, pp. 1147-1156, Aug. 1992.
[11] S.E. Wilton and N.P. Jouppi, An Enhanced Access and Cycle Time Model for On-Chip Caches DEC WRL Research 93/5, July 1994.
[12] P. Shivakumar and N.P. Jouppi, CACTI3.0: An Integrated Cache Timing, Power, and Area Power Model DEC WRL Research 2001/2, Aug. 2001.
[13] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, MediaBench: A Tool For Evaluating and Synthesizing Multimedia and Communications Systems Proc. 30th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 330-335, 1997.
[14] Texas Instruments, TMS320C6000 Benchmarks 2000, http://www.ti.com/sc/docs/products/dsp/c6000 62bench.htm.
[15] D. Burger and T.M. Austin, The SimpleScalar Tool Set, Version 2.0 Technical Report #1342, Computer Sciences Dept., Univ. of Wisconsin-Madison, June 1997.
[16] D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A Framework for Architectural-Level Power Analysis and Optimizations Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 83-94, June 2000.
[17] R. Joseph and M. Martonosi, Run-Time Power Estimation in High Performance Microprocessors Proc. Int'l Symp. Low Power Electronics and Design, pp. 135-140, 2001.

Index Terms:
On-chip data cache, adaptive computing, multimedia processing, cache access time, cache energy dissipation.
Citation:
Rama Sangireddy, Huesung Kim, Arun K. Somani, "Low-Power High-Performance Reconfigurable Computing Cache Architectures," IEEE Transactions on Computers, vol. 53, no. 10, pp. 1274-1290, Oct. 2004, doi:10.1109/TC.2004.80
Usage of this product signifies your acceptance of the Terms of Use.