This Article 
 Bibliographic References 
 Add to: 
Fuzzy Memoization for Floating-Point Multimedia Applications
July 2005 (vol. 54 no. 7)
pp. 922-927
Instruction memoization is a promising technique to reduce the power consumption and increase the performance of future low-end/mobile multimedia systems. Power and performance efficiency can be improved by reusing instances of an already executed operation. Unfortunately, this technique may not always be worth the effort due to the power consumption and area impact of the tables required to leverage an adequate level of reuse. In this paper, we introduce and evaluate a novel way of understanding multimedia floating-point operations based on the fuzzy computation paradigm: Performance and power consumption can be improved at the cost of small precision losses in computation. By exploiting this implicit characteristic of multimedia applications, we propose a new technique called tolerant memoization. This technique expands the capabilities of classic memoization by associating entries with similar inputs to the same output. We evaluate this new technique by measuring the effect of tolerant memoization for floating-point operations in a low-power multimedia processor and discuss the trade-offs between performance and quality of the media outputs. We report energy improvements of 12 percent for a set of key multimedia applications with small LUT of 6 Kbytes, compared to 3 percent obtained using previously proposed techniques.

[1] D. Goldberg, “What Every Computer Scientist Should Know about Floating-Point Arithmetic,” ACM Computing Surveys, vol. 23, no. 1, pp. 5-48, 1991.
[2] C. Lee, M. Potkonjak, and W.H. Magione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO 97), Dec. 1997.
[3] D. Citron, D. Feitelson, and L. Rudolph, “Accelerating Multi-Media Processing by Implementing Memoing in Multiplication and Division Units,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '95), 1995.
[4] Graphic State, “GBA 3D Engine,” , 2002.
[5] Casio, “Cassiopea,” cfm?section=19&product=3553&display=15&cid=3949 , 2001.
[6] UMTS Forum, “UMTS Forum,” http://www.umts-forum.orginformation. html , 2001.
[7] R. Koenen, “MPEG-4, Multimedia for Our Time,” IEEE Spectrum, pp. 26-34, Feb. 1999.
[8] M.H. Lipasti and J.P. Shen, “Exceeding the Dataflow Limit,” Proc. 29th Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '96), pp 226-237, Dec. 1996.
[9] J.G. Steffan and T.C. Mowry, “The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization,” Proc. Fourth Int'l Symp. High-Performance Computer Architecture (HPCA '98), Feb. 1998.
[10] H. Akkary and M. Driscoll, “A Dynamic Multithreaded Processor,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '98), 1998.
[11] A. Roth and G.S. Sohi, “Speculative Data-Driven Multithreading,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture (HPCA '01), 2001.
[12] A. Sodani and G.S. Sohi, “Dynamic Instruction Reuse,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '97), 1997.
[13] D.A. Connors and W.M. Hwu, “Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results,” Proc. 32nd Ann. ACM/IEEE Int'l Symp. Microarchitecture (MICRO '99), 1999.
[14] S.S. Sastry, R. Bodik, and J.E. Smith, “Characterizing Coarse-Grained Reuse of Computation,” Proc. Third ACM Workshop Feedback-Directed and Dynamic Optmization, 2000.
[15] M. Azam, P. Franzon, and W. Liu, “Low Power Data Processing by Elimination of Redundant Computations,” Proc. 1997 Int'l Symp. Low Power Electronics and Design, pp. 259-264, 1997.
[16] Int'l Telecomm. Union, “Home Page,” http:/, 2004.
[17] Texas Instruments, “DSP Developers' Village,” , 2004.
[18] F. Arakawa, O. Nishii, K. Uchiyama, and N. Nakagawa, “SH4 Risc Multimedia Processor,” IEEE Micro, Mar./Apr. 1998.
[19] S. Hagiware and I. Oliver, “Sega Dreamcast: Creating a Unified Entertainment World,” IEEE Micro, Nov./Dec. 1999.
[20] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, Version 2.0,” Technical Report #1342, Computer Science Dept., Univ. of Wisonsin-Madison, 1997.
[21] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), 2000.
[22] The LAME Project, “Home Page,” http://www.mp3dev.orgmp3/, 2004.
[23] Dept. of Signal Theory and Comm., “Speech Processing Group,” http://gps-tsc.upc.esveu/, Universitat Politecnica de Catalunya, 2004.
[24] S.E. Richardson, “Exploiting Trivial and Redundant Computation,” Proc. 11th IEEE Symp. Computer Arithmetic, 1993.
[25] P. Shivakumar and N.P. Jouppi, “Cacti 3.0: An Integrated Cache Timing, Power and Area Model,” CACTI.html, technical report, Compaq Computer Corp., 2001.
[26] E.B. Goldstein, Sensation and Perception, sixth ed. Univ. of Pittsburgh, 2002.

Index Terms:
Index Terms- Low-power design, special-purpose and application-based systems, real-time and embedded systems.
Carlos ?lvarez, Jes? Corbal, Mateo Valero, "Fuzzy Memoization for Floating-Point Multimedia Applications," IEEE Transactions on Computers, vol. 54, no. 7, pp. 922-927, July 2005, doi:10.1109/TC.2005.119
Usage of this product signifies your acceptance of the Terms of Use.