This Article 
 Bibliographic References 
 Add to: 
A Trace-Capable Instruction Cache for Cost-Efficient Real-Time Program Trace Compression in SoC
December 2011 (vol. 60 no. 12)
pp. 1665-1677
Chun-Hung Lai, National Sun Yat-Sen University, Kaohsiung
Fu-Ching Yang, National Sun Yat-Sen University, Kaohsiung
Ing-Jer Huang, National Sun Yat-Sen University, Kaohsiung
This paper presents a novel approach to make the on-chip instruction cache of a SoC to function simultaneously as a regular instruction cache and a real-time program trace compressor, named trace-capable cache (TC-cache). It is accomplished by exploiting the dictionary feature of the instruction cache with a small support circuit attached to the side of the cache. Compared with related work, this work has the advantage of utilizing the existing instruction cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource and power consumption. The TC-cache can be configured to work simultaneously as the instruction cache and the trace compressor, named the online mode, or exclusively as the trace compressor, named the bypass mode. The RTL implementation of a 4 KB trace-capable instruction cache, a 4 KB data cache, and an academic ARM processor core has been accomplished. The experiments show that the TC-cache achieves average compression ratio of 90 percent with a very small hardware overhead of 3,652 gates (1.1 percent). It takes only 0.2 percent additional system power for the online mode operation. In addition, the trace support circuit does not impair the global critical path. Therefore, the proposed approach is a highly feasible on-chip debugging/monitoring solution for SoCs, even for cost-sensitive ones such as consumer electronics. Furthermore, the same concept can be applied to the data cache to compress the data address trace as well.

[1] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Information Theory, vol. IT-23, no. 3, pp. 337-343, May 1977.
[2] IEEE-ISTO Nexus 5001 Forum, http:/, 2010.
[3] Xtensa Processor Real-Time Trace, Tensilica Inc., http://www. xtensalxtraceLX.htm, 2010.
[4] MPC565 Reference Manual, Chapter 22, Development Support, Freescale Semiconductor Inc., Nov. 2005.
[5] Embedded Trace Macrocell Architecture, ARM Ltd., , 2010.
[6] M.-C. Hsieh and C.-T. Huang, “An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform,” Proc. IEEE Design Automation Conf., pp. 866-871, June 2008.
[7] A. Hopkins and K. McDonald-Maier, “Debug Support Strategy for Systems-on-Chips with Multiple Processor Cores,” IEEE Trans. Computers, vol. 55, no. 2, pp. 174-184, Feb. 2006.
[8] J.-M. Chen and C.-H. Wei, “VLSI Design for High-Speed LZ-Based Data Compression,” IEE Proc. Circuits, Devices, and Systems, vol. 146, no. 5, pp. 268-278, Oct. 1999.
[9] M.-B. Lin, J.-F. Lee, and G.E. Jan, “A Lossless Data Compression and Decompression Algorithm and Its Hardware Architecture,” IEEE Trans. VLSI Systems, vol. 14, no. 9, pp. 925-936, Sept. 2006.
[10] J. Nunez and S. Jones, “Gbit/s Lossless Data Compression Hardware,” IEEE Trans. VLSI Systems, vol. 11, no. 3, pp. 499-510, June 2003.
[11] S. Kasera and N. Jain, “A Survey of Lossless Data Compression Techniques,” technical report, 2004.
[12] C.-F. Kao, S.-M. Huang, and I.-J. Huang, “A Hardware Approach to Real-Time Program Trace Compression for Embedded Processors,” IEEE Trans. Circuits and Systems I, vol. 54, no. 3, pp. 530-543, Mar. 2007.
[13] J. Montanaro, R. Witek, and K. Anne, “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11 pp. 1703-1714, Nov. 1996.
[14] S3C4510B Data Sheet, Samsung Electronic.
[15] ADSP-BF537 Data Sheet, Analog Devices Inc., Feb. 2009.
[16] TMS320C6711D Floating-Point Digital Signal Processor (Rev. B), Texas Instruments Inc., June 2006.
[17] P.P. Ranjan, N.D. Dutt, and A. Nicolau, Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Kluwer Academic Publishers, 1999.
[18] D. Chiou, P. Jain, L. Rudolph, and S. Devadas, “Application-Specific Memory Management for Embedded Systems Using Software-Controlled Caches,” Proc. IEEE Design Automation Conf., pp. 416-419, June 2000.
[19] D. Hillenbrand and J. Henkel, “Block Cache for Embedded Systems,” Proc. IEEE Design Automation Conf. Asia and South Pacific, pp. 322-327, Mar. 2008.
[20] N. Baron, P. Marino, A. Goren, and E. Melanmed-Cohen, Real Time Cache Implemented by On-Chip Memory Having Standard and Cache Operating Modes, US Patent, 1996.
[21] D.L. Cullison and T.A. Wagner, Multi-Purpose Cache Memory Selectively Addressable Either as a Boot Memory or as a Cache Memory, US Patent, 1992.
[22] S. Basumallick and K. Nilsen, “Cache Issues in Real-Time Systems,” Proc. ACM SIGPLAN Workshop Language, Compiler and Tool Support for Real-Time Systems, May 1994.
[23] Embedded Trace Macrocell ETMv1.0 to ETMv3.4 Architecture Specification, Chapter 4.6, Data Trace, ARM Ltd., July 2007.
[24] Y.-T. Lin and I.-J. Huang, “Enhanced 32-bit Microprocessor-Based SoC for Energy Efficient MP3 Decoding in Portable Devices,” Proc. Int'l Conf. Consumer Electronics, pp. 1-2, Jan. 2007.
[25] LEON2 Processor User's Manual, Gaisler Research, May 2004.
[26] C. MacNamee and D. Heffernan, “Emerging On-Chip Debugging Techniques for Real-Time Embedded Systems,” Computing and Control Eng. J., vol. 11, pp. 295-303, Dec. 2000.
[27] PrimePower Manual, Synopsys Inc., June 2005.
[28] UMC 0.18 μm Process High-Speed Single-Port SRAM (HS-SRAM-SP) Generator User Manual, Artisan Components Inc., Aug. 2000.

Index Terms:
Program trace, compression, cache, real time.
Chun-Hung Lai, Fu-Ching Yang, Ing-Jer Huang, "A Trace-Capable Instruction Cache for Cost-Efficient Real-Time Program Trace Compression in SoC," IEEE Transactions on Computers, vol. 60, no. 12, pp. 1665-1677, Dec. 2011, doi:10.1109/TC.2010.194
Usage of this product signifies your acceptance of the Terms of Use.