This Article 
 Bibliographic References 
 Add to: 
Compressing Cache State for Postsilicon Processor Debug
April 2011 (vol. 60 no. 4)
pp. 484-497
Preeti Ranjan Panda, Indian Institute of Technology Delhi, New Delhi
M. Balakrishnan, Indian Institute of Technology Delhi, New Delhi
Anant Vishnoi, Indian Institute of Technology Delhi, New Delhi
During postsilicon processor debugging, we need to frequently capture and dump out the internal state of the processor. Since internal state constitutes all memory elements, the bulk of which is composed of cache, the problem is essentially that of transferring cache contents off-chip, to a logic analyser. In order to reduce the transfer time and save expensive logic analyser memory, we propose to compress the cache contents on their way out. We present a hardware compression engine for cache data using a Cache-Aware Compression strategy that exploits knowledge of the cache fields and their behavior to achieve an effective compression. Experimental results indicate that the technique results in 7-31 percent better compression than one that treats the data as just one long bit stream. We also describe and evaluate a parallel compression architecture that uses multiple compression engines, resulting in a 54 percent reduction in transfer time.

[1] B. Vermeulen, M.Z. Urfianto, and S.K. Goel, "Automatic Generation of Breakpoint Hardware for Silicon Debug," Proc. 41st Design Automation Conf., June 2004.
[2] K.J. Balakrishnan, N.A. Touba, and S. Patil, "Compressing Functional Tests for Microprocessors," Proc. 14th Asian Test Symp. Asian Test Symp., Dec. 2005.
[3] E. Anis and N. Nicolici, "Low Cost Debug Architecture Using Lossy Compression for Silicon Debug," Proc. Conf. Design, Automation and Test in Europe, pp. 225-230, Apr. 2007.
[4] S. Park and S. Mitra, "IFRA: Instruction Footprint Recording and Analysis for Post-Silicon Bug Localization in Processors," Proc. IEEE/ACM Design Automation Conf., June 2008.
[5] H. Fang, C. Tong, B. Yao, X. Song, and X. Cheng, "CacheCompress: A Novel Approach for Test Data Compression with Cache for IP Embedded Cores," Proc. Int'l Conf. Computer-Aided Design (ICCAD '07), Nov. 2007.
[6] M. Kjelso, M. Gooch, and S. Jones, "Design and Performance of a Main Memory Hardware Data Compressor," Proc. 22nd EUROMICRO Conf., Sept. 1996.
[7] R.B. Tremaine, P.A. Franaszek, J.T. Robinson, C.O. Schulz, T.B. Smith, M.E. Wazlowski, and P.M. Bland, "IBM Memory Expansion Technology (MXT)," IBM J. Research and Development, vol. 45, no. 2, pp. 271-285, 2001.
[8] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression," IEEE Trans. Information Theory, vol. IT-23, no. 3, pp. 337-343, May 1977.
[9] T.A. Welch, "A Technique for High-Performance Data Compression," Computer, vol. 17, no. 6, pp. 8-19, June 1984.
[10] M.-B. Lin, "A Hardware Architecture for the LZW Compression and Decompression Algorithms Based on Parallel Dictionaries," J. VLSI Signal Processing Systems, vol. 26, no. 3, pp. 369-381, 2000.
[11] C. Su, C.-F. Yen, and J.-C. Yo, "Hardware Efficient Updating Technique for LZW CODEC Design," Proc. 1997 IEEE Int'l Symp. Circuits and Systems, June 1997.
[12] K.-J. Lin and C.-W. Wu, "A Low-Power CAM Design for LZ Data Compression," IEEE Trans. Computers, vol. 49, no. 10, pp. 1139-1145, Oct. 2000.
[13] R. Samanta and R.N. Mahapatra, "An Enhanced CAM Architecture to Accelerate LZW Compression Algorithm," Proc. 20th Int'l Conf. VLSI Design, Jan. 2007.
[14] E.A. Daoud and N. Nicolici, "Real-Time Lossless Compression for Silicon Debug," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 9, pp. 1387-1400, Sept. 2009.
[15] L. Benini, D. Bruni, B. Ricco, A. Macii, and E. Macii, "An Adaptive Data Compression Scheme for Memory Traffic Minimization in Processor-Based Systems," Proc. IEEE Int'l Symp. Circuits and Systems (ISCAS '02), May 2002.
[16] Y. Zhang and R. Gupta, "Data Compression Transformations for Dynamically Allocated Data Structures," Proc. Int'l Conf. Compiler Construction (CC), Apr. 2002.
[17] H. Lekatsas and W. Wolf, "Code Compression for Embedded Systems," Proc. 35th Design Automation Conf., June 1998.
[18] H. Lekatsas and W. Wolf, "SAMC: A Code Compression Algorithm for Embedded Processors," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 12, pp. 1689-1701, Dec. 1999.
[19] L. Yang, R.P. Dick, H. Lekatsas, and S. Chakradhar, "CRAMES: Compressed RAM for Embedded Systems," Proc. Third IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis, Sept. 2005.
[20] A.R. Alameldeen and D.A. Wood, "Adaptive Cache Compression for High-Performance Processors," Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), June 2004.
[21] J.-S. Lee, W.-K. Hong, and S.-D. Kim, "Design and Evaluation of a Selective Compressed Memory System," Proc. Int'l Conf. Computer Design (ICCD '99), Oct. 1999.
[22] H. Lekatsas, J. Henkel, and W. Wolf, "A Decompression Architecture for Low Power Embedded Systems," Proc. Int'l Conf. Computer Design, Sept. 2000.
[23] A.R. Alameldeen and D.A. Wood, "Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches," technical report, Univ. of Wisconsin-Madison, Apr. 2004.
[24] V.S. Miller and W.N. Wegman, "Variation on a Theme by Ziv and Lempel," Combinatorial Algorithms on Words, vol. F12, pp. 131-140, 1985.
[25] W. Fung, "Low Power Circuits for Multiple Match Resolution and Detecting for Ternary CAM," MSc thesis, Dept. of Electrical and Computer Eng., Univ. of Waterloo, 2004.
[26] D. Burger and T. Austin, "The Simplescalar Tool Set, Version 2.0," Technical Report cs-tr-97-1342, June 1997.
[27] J. Edler and M. Hill, "Dinero IV Trace-Driven Uniprocessor Cache Simulator,", 2010.
[28] H. Ghasemzadeh, S.S. Mazrouee, and M.R. Kakoee, "Modified Pseudo LRU Replacement Algorithm," Proc. 13th Ann. IEEE Int'l and Workshop Eng. of Computer Based Systems, Mar. 2006.
[29] D. Tarjan, S. Thoziyoor, and N.P. Jouppi, "CACTI 4.0," technical report, HP Laboratories, June 2006.
[30] A. Vishnoi, P.R. Panda, and M. Balakrishnan, "Online Cache State Dumping for Processor Debug," Proc. IEEE/ACM Design Automation Conf., pp. 358-363, 2009.

Index Terms:
Postsilicon validation, processor debug, cache compression.
Preeti Ranjan Panda, M. Balakrishnan, Anant Vishnoi, "Compressing Cache State for Postsilicon Processor Debug," IEEE Transactions on Computers, vol. 60, no. 4, pp. 484-497, April 2011, doi:10.1109/TC.2010.123
Usage of this product signifies your acceptance of the Terms of Use.