This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
June 2005 (vol. 54 no. 6)
pp. 752-766
Reducing the energy consumption of a real-time system has emerged as an important design concern. In this paper, we propose GAARP, an adaptive scalable architecture targeted toward algorithm-specific tasks for just-in-time performance using the right amount of power. The architecture consists of Globally Asynchronous and Locally Synchronous (GALS) building blocks, where the processing hardware is realized by a set of smaller slices of similar structure, each running synchronously with independent clocks. We demonstrate that, for different real-time commercial applications with algorithm-specific jobs like online transaction processing, digital filtering, Fourier transform, etc., the proposed architecture allows dynamic load-balancing and adaptive intertask voltage scaling based on the load in each of the processing units. Compared to a synchronous implementation of the same functionality, we show that the proposed hardware can achieve higher efficiency in terms of power and performance by exploiting the flexibility to balance the load and change the supply voltage. The architecture also lends itself to process tolerance since it can detect process-shifts for the individual processing units and determine the appropriate operating voltage/frequency for each unit. Simulation results for two representative applications show that, for a modest system configuration and random job distribution, we obtain up to 67 percent improvement in MOPS/W (millions of operations per second per watt) over a fully synchronous implementation.

[1] K. Roy and S.C. Prasad, Low Power CMOS VLSI Circuit Design. Wiley-Interscience, Feb. 2000.
[2] S. Tyagi et al., “A 130 nm Generation Logic Technology Featuring 70 nm Transistors, Dual Vt Transistors and 6 Layers of Cu Interconnects,” Digest Technical Papers Int'l Electron Devices Meeting, pp. 567-570, 2000.
[3] T.A. Burd, T.A. Pering, A.J. Stratakos, and R.W. Brodersen, “A Dynamic Voltage Scaled Microprocessor System,” IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 1571-1580, 2000.
[4] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic Voltage Scaling on a Low-Power Microprocessor,” Proc. Int'l Conf. Mobile Computing and Networking, pp. 251-259, 2001.
[5] G. Magklis, M.L. Scott, G. Semeraro, D.H. Albonesi, and S. Dropsho, “Profile-Based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor,” Proc. Int'l Symp. Computer Architecture, pp. 14-25, 2003.
[6] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, third ed. Morgan Kaufmann, 2003.
[7] D.M. Chapiro, “Globally Asynchronous Locally Synchronous Systems,” PhD thesis, Stanford Univ., 1984.
[8] A. Hemani, T. Meincke, S. Kumar, and A. Postula, “Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style,” Proc. Design Automation Conf., pp. 873-878, 1999.
[9] A. Iyer and D. Marculescu, “Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors,” Proc. Int'l Symp. Computer Architecture, pp. 652-661, 2002.
[10] G. Semeraro, G. Magklis, M.L. Scott, D.H. Albonesi, R. Balasubramonian, and S. Dwarkadas, “Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling,” Proc. Int'l Symp. High Performance Computing, pp. 29-40, 2002.
[11] I.E. Sutherland, “Micropipelines,” Comm. ACM, vol. 32, no. 6, pp. 720-738, 1989.
[12] L. Chiou, S. Bhunia, and K. Roy, “Synthesis of Application-Specific Highly-Efficient Multi-Mode Systems for Low-Power Applications,” Proc. IEEE Conf. Design, Automation, and Test in Europe, pp. 96-101, 2003.
[13] M.J. Deen, M.H. Kazemenini, and S. Naseh, “Performance Characteristics of an Ultra-Low Power VCO,” Proc. IEEE Int'l Symp. Circuits and Systems, pp. 697-700, 2003.
[14] Y.A. Eken and J.P. Uyemura, “A 5. 9-GHz Voltage-Controlled Oscillator in 0. 18-µm CMOS,” IEEE J. Solid-State Circuits, pp. 230-233, 2004.
[15] I. Hong, M. Potkonjak, and M.B. Srivastava, “On-Line Scheduling of Hard Real-Time Tasks on Variable Voltage Processor,” Proc. Int'l Conf. Computer Aided Design, pp. 653-656, 1998.
[16] S. Hauck, “Asynchronous Design Methodologies: An Overview,” Proc. IEEE, vol. 83, no. 1, pp. 69-93, 1995.
[17] H. Aydin, R. Melhem, D. Mosse, and P.M. Alvarej, “Power-Aware Scheduling for Periodic Real-Time Tasks,” IEEE Trans. Computers, vol. 53, no. 5, pp 584-600, May 2004.
[18] S.W. Moore, G.S. Taylor, P.A. Cunningham, R.D. Mullins, and P. Robinson, “Self Calibrating Clocks for Globally Asynchronous Locally Synchronous Circuits,” Proc. Int'l Conf. Computer Design, pp. 73-78, 2000.
[19] P. Dibble, “Deadline Scheduling,” Embedded Systems Programming, 2001.
[20] T. Chelcea and S.M. Nowick, “A Low-Latency FIFO for Mixed-Clock Systems,” Proc. IEEE CS Workshop VLSI, pp. 119-126, 2000.
[21] T. Chelcea and S.M. Nowick, “Robust Interface for Mixed-Timing Systems with Application to Latency-Insensitive Protocols,” Proc. Design Automation Conf., pp. 21-26, 2000.
[22] D.E. Culler and J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1999.
[23] T.K.K. Kan, G.C.T. Leung, and H.C. Luong, “A 2-V 1.8-GHz Fully Integrated CMOS Dual-Loop Frequency Synthesizer,” IEEE J. Solid-State Circuits, pp. 1012-1020, Aug. 2002.
[24] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter Variations and Impact on Circuits and Microarchitecture,” Proc. Design Automation Conf., pp. 338-342, 2003.
[25] H. Li, C. Cher, T. Vijaykumar, and K. Roy, “VSV: L2-Miss-Driven Variable Supply Voltage Scaling for Low-Power,” Proc. Int'l Symp. Microarchitecture, pp. 19-28, 2003.
[26] V. Szekely, C. Marta, Z. Kohari, and M. Rencz, “CMOS Sensors for On-Line Thermal Monitoring of VLSI Circuits,” IEEE Trans. Very Large Scale Integration Systems, vol. 5, no. 3, pp. 270-276, 1997.
[27] J. Seizovic, “Pipeline Synchronization,” Proc. Int'l Symp. Advanced Research in Asynchronous Circuits and Systems, pp. 87-96, 1994.
[28] J. Xiao, A. Peterchev, J. Zhang, and S. Sanders, “An Ultra-Low-Power Digitally-Controlled Buck Converter IC for Cellular Phone Applications,” Proc. 19th Ann. IEEE Applied Power Electronics Conf. and Exposition, vol. 1, pp. 383-391, 2004.
[29] G.P. Semeraro, D.H. Albonesi, G. Magklis, M.L. Scott, S.G. Dropsho, and S. Dwarkadas, “Hiding Synchronization Delays in GALS Processor Microarchitecture,” Proc. Int'l Symp. Asynchronous Circuits and Systems, pp. 159-169, 2004.
[30] L. Wanhammar, DSP Integrated Circuits. Academic Press, 1999.
[31] FFT benchmarks, http://cag.lcs.mit.edu/streamit/resultsfft , 2002.
[32] S. Kunkel, B. Armstrong, and P. Vitale, “System Optimization for OLTP Workloads,” IEEE J. Microarchitecture, vol. 19, no. 3, pp. 56-64, 1999.
[33] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimization,” Proc. Intl Symp. Computer Architecture, pp. 83-94, 2000.

Index Terms:
Asynchronous/synchronous operations, algorithms implemented in hardware, fault tolerance, energy-aware systems.
Citation:
Swarup Bhunia, Animesh Datta, Nilanjan Banerjee, Kaushik Roy, "GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks," IEEE Transactions on Computers, vol. 54, no. 6, pp. 752-766, June 2005, doi:10.1109/TC.2005.99
Usage of this product signifies your acceptance of the Terms of Use.