Issue No.07 - July (2008 vol.57)
pp: 916-927
As the speed of processors increases, the on-chip memory hierarchy will continue to be crucial for the performance. Unfortunately, simply increasing the size of the on-chip caches yields diminishing returns and memory-bound applications may suffer from the limited off-chip bandwidth. This paper focuses on memory-link compression schemes. A first contribution is a framework for identifying the nature of the value locality exploited by published schemes. This framework is then used to quantitatively establish what type of value locality is exploited by each compression scheme. We find that as many as 40% of the values transferred in integer, media, and commercial applications are small integers and can be coded using less than 8 bits. By leveraging small-value locality, 35% of the bandwidth can be freed up. Another significant chunk of the values either forms clusters in the value space or belongs to a fairly small group of frequent isolated values. By leveraging this category, one can free up 70% of the bandwidth. We finally contribute with a new compression scheme that exploits multiple value-locality categories and is shown to free up 75% of the bandwidth.
I/O and Data Communications, Data compaction and compression, Memory Structures
Martin Thuresson, Per Stenstrom, "Memory-Link Compression Schemes: A Value Locality Perspective", IEEE Transactions on Computers, vol.57, no. 7, pp. 916-927, July 2008, doi:10.1109/TC.2008.28
[1] “International Technology Roadmap for Semiconductors: Executive Summary,” http:/, 2003.
[2] L. Spracklen and S.G. Abraham, “Chip Multithreading: Opportunities and Challenges,” Proc. 11th Int'l Symp. High-Performance Computer Architecture, pp. 248-252, Feb. 2005.
[3] D. Burger, J.R. Goodman, and A. Kägi, “Memory Bandwidth Limitations of Future Microprocessors,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 78-89, May 1996.
[4] M. Ekman and P. Stenstrom, “A Robust Main-Memory Compression Scheme,” Proc. 32nd Ann. Int'l Symp. Computer Architecture, pp. 74-85, June 2005.
[5] A.R. Alameldeen and D.A. Wood, “Adaptive Cache Compression for High-Performance Processors,” Proc. 31st Ann. Int'l Symp. Computer Architecture, pp. 212-223, June 2004.
[6] S. Balakrishnan and G.S. Sohi, “Exploiting Value Locality in Physical Register Files,” Proc. 36th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 265-276, Dec. 2003.
[7] D. Brooks and M. Martonosi, “Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance,” Proc. Fifth Int'l Symp. High-Performance Computer Architecture, pp.13-22, Jan. 1999.
[8] R. Canal, A. González, and J.E. Smith, “Software-Controlled Operand-Gating,” Proc. Second Int'l Symp. Code Generation and Optimization, pp. 125-136, Mar. 2004.
[9] D.W. Hammerstrom and E.S. Davidson, “Information Content of CPU Memory Referencing Behavior,” Proc. Fourth Ann. Int'l Symp. Computer Architecture, pp. 184-192, Mar. 1977.
[10] M.K. Farrens and A. Park, “Dynamic Base Register Caching: A Technique for Reducing Address Bus Width,” Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 128-137, May 1991.
[11] D. Citron and L. Rudolph, “Creating a Wider Bus Using Caching Techniques,” Proc. First IEEE Symp. High-Performance Computer Architecture, pp. 90-99, Jan. 1995.
[12] K. Kant and R.K. Iyer, “Design and Performance of Compressed Interconnects for High Performance Servers,” Proc. 21st Int'l Conf. Computer Design, pp. 164-169, Oct. 2003.
[13] J. Yang, R. Gupta, and C. Zhang, “Frequent Value Encoding for Low Power Data Buses,” ACM Trans. Design Automation of Electronic Systems, vol. 9, no. 3, pp. 354-384, July 2004.
[14] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A Full System Simulation Platform,” Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[15] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 330-335, Dec. 1997.
[16] R. Gonzalez, A. Cristal, D. Ortega, A. Veidenbaum, and M. Valero, “A Content Aware Integer Register File Organization,” Proc. 31st Ann. Int'l Symp. Computer Architecture, pp. 314-324, June 2004.
[17] M.H. Lipasti, B.R. Mestan, and E. Gunadi, “Physical Register Inlining,” Proc. 31st Ann. Int'l Symp. Computer Architecture, pp. 325-335, June 2004.
[18] T. Sato and I. Arita, “Table Size Reduction for Data Value Predictors by Exploiting Narrow Width Values,” Proc. 14th Int'l Conf. Supercomputing, pp. 196-205, May 2000.
[19] G.H. Loh, “Width Prediction for Reducing Value Predictor Size and Power,” Proc. First Value-Prediction Workshop, pp. 86-93, June 2003.
[20] K.R. Gandhi and N.R. Mahapatra, “A Study of Hardware Techniques that Dynamically Exploit Frequent Operands to Reduce Power Consumption in Integer Functional Units,” Proc. 21st Int'l Conf. Computer Design, pp. 426-428, Oct. 2003.
[21] G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin, “Speculative Software Management of Datapath-Width for Energy Optimization,” Proc. ACM SIGPLAN/SIGBED Conf. Languages, Compilers, and Tools, pp. 78-87, June 2004.
[22] G.H. Loh, “Exploiting Data-Width Locality to Increase Superscalar Execution Bandwidth,” Proc. 35th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 395-405, Nov. 2002.
[23] R. Canal, A. González, and J.E. Smith, “Very Low Power Pipelines Using Significance Compression,” Proc. 33rd Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 181-190, Dec. 2000.
[24] M. Själander, H. Eriksson, and P. Larsson-Edefors, “An Efficient Twin-Precision Multiplier,” Proc. 22nd Int'l Conf. Computer Design, pp. 30-33, Oct. 2004.
[25] D. Citron, “Exploiting Low Entropy to Reduce Wire Delay,” Computer Architecture Letters, vol. 3, 2004.
[26] J. Yang and R. Gupta, “FV Encoding for Low-Power Data I/O,” Proc. Int'l Symp. Low Power Electronics and Design, pp. 84-87, Aug. 2001.
[27] A.R. Alameldeen and D.A. Wood, “Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches,” Technical Report CS-TR-2004-1500, Dept. of Computer Sciences, Apr. 2004.
[28] M.M. Islam and P. Stenstrom, “Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations,” Proc. Int'l Conf. Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 28-34, July 2006.
[29] M. Kjelso, M. Gooch, and S. Jones, “Empirical Study of Memory-Data: Characteristics and Compressibility,” Proc. IEE Computers and Digital Techniques, vol. 145, pp. 63-67, Jan. 1998.
[30] M. Burtscher and P. Ratanaworabhan, “High Throughput Compression of Double-Precision Floating-Point Data,” Proc. Data Compression Conf., pp. 293-302, Mar. 2007.
[31] M. Thuresson and P. Stenstrom, “Scalable Value-Cache Based Compression Schemes for Multiprocessors,” Proc. 18th Int'l Symp. Computer Architecture and High Performance Computing, pp. 117-124, Oct. 2006.