This Article 
 Bibliographic References 
 Add to: 
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags
November 1997 (vol. 46 no. 11)
pp. 1187-1201

Abstract—This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memory of microprocessors. The main idea of the technique is Caching Address Tags, or CAT cache, for short. The CAT cache exploits locality property that exists among addresses of memory references. By keeping only a limited number of distinct tags of cached data, rather than having as many tags as cache lines, the CAT cache can reduce the cost of implementing tag memory by an order of magnitude without noticeable performance difference from ordinary caches. Therefore, CAT represents another level of caching for cache memories. Simulation experiments are carried out to evaluate performance of CAT cache as compared to existing caches. Performance results of SPEC92 programs show that the CAT cache, with only a few tag entries, performs as well as ordinary caches, while chip-area saving is significant. Such area saving will increase as the address space of a processor increases. By allocating the saved chip-area for larger cache capacity, or more powerful functional units, CAT is expected to have a great impact on overall system performance.

[1] R.L. Sites, "Alpha AXP Architecture," Digital Technical J., vol. 4, no. 4, pp. 19-34, 1992.
[2] F. Okamoto et al., "A 200-MFLOPS 100-MHz 64-b BiCMOS Vector-Pipelined Processor (VPP) VLSI," J. Solid State Circuits, vol. 26, pp. 1,885-1,892, Dec. 1991.
[3] PowerPC 601, RISC Microprocessor User's Manual. Motorola, 1993.
[4] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.
[5] D. Nagle, R. Uhlig, T.M. Mudge, and S. Sechrest, "Optimal Allocation of On-Chip Memory for Multiple-API Operating Systems," Proc. 21st Ann. Int'l Symp. Computer Architcture, pp. 358-369, Apr. 1994.
[6] M. Farrens, G. Tyson, and A.R. Pleszkun, "A Study of Single-Chip Processor/Cache Organizations for Large Number of Transistors," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 338-347, Apr. 1994.
[7] J.D. Gee, M.D. Hill, D.N. Pnevmatikatos, and A.J. Smith, "Cache Performance of the SPEC92 Benchmark Suite," IEEE Micro, pp. 17-27, Aug. 1993.
[8] Q. Yang and S. Adina, "A One's Complement Cache," Proc. '94 Int'l Conf. Parallel Processing, pp. 250-258, Aug. 1994.
[9] Q. Yang, "Introducing a New Cache Design into Vector Computers," IEEE Trans. Computers, vol. 432 no. 12, pp. 1,411-1,424, Dec. 1993.
[10] J.L. Hennessy and N.P. Jouppi, "Computer Technology and Architecture: An Evolving Interaction," Computer, pp. 18-29, Sept. 1991.
[11] J.R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic," Proc. 10th Ann. Symp. Computer Architecture, pp. 124-132, 1983.
[12] J. Fu and J.H. Patel, "Data Prefetching in Multiprocessor Vector Cache Memories," Proc. 18th Int'l Symp. Computer Architecture, pp. 54-63, 1991.
[13] J. Torrellas, M. Lam, and J. Hennessey, "False Sharing and Spatial Locality in Multiprocessor Caches," IEEE Trans. Computers, vol. 43, no. 6, pp. 651-663, June 1994.
[14] S. Eggers and R. Katz, “The Effect of Sharing on the Cache and Bus Performance of Parallel Programs,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 257-270, Apr. 1989.
[15] C. Dubnicki and T. LeBlanc, "Adjustable Block Size Coherence Caches," Proc. 19th Ann. Int'l Symp. Computer Architecture,Queensland, Australia, May 1992.
[16] Q. Yang, L. Bhuyan, and B.-C. Liu, "Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor," IEEE Trans. Computers., vol. 38, no. 8, pp. 1,143-1,153, Aug. 1989.
[17] TMS390Z55 Cache Controller, Data Sheet. Texas Instruments, 1992.
[18] A. Seznec, "Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio," Proc. 21st Ann. Int'l Symp. Computer Architecture, ACM, 1994, pp. 384-393.
[19] D. Hammerstrom and E. Davidson, "Information Content of CPU Memory Referencing Behavior," Proc. Fourth Ann. Symp. Computer Architecture, pp. 184-192, Mar. 1977.
[20] A.R. Pleszkun, B.R. Rau, and E.S. Davidson, "An Address Prediction Mechanism for Reducing Processor-Memory Address Bandwidth," Proc. 1981 IEEE Workshop Computer Architecture for Pattern Analysis and Image Database Management, pp. 141-148, Nov. 1981.
[21] M. Farrens and A. Park, "Dynamic Base Register Caching: A Technique for Reducing Address Bus Width," Proc. 18th Ann. Int'l Symp. Computer Architecture, ACM, 1991, pp. 128-137.
[22] N.P. Jouppi, "Cache Write Policies and Performance," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 191-201, May 1993.
[23] M.D. Hill, "A Case for Direct-Mapped Caches," Computer, pp. 25-40, Dec. 1988.
[24] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley, 1994.
[25] M. Smith, "Tracing with Pixie," Technical Report CSL-TR-91-497, Nov. 1991.
[26] D. Alpert, "Memory Hierarchies for Directly Executed Language Microprocessors," Technical Report 84-260, Computer Systems Laboratory, Stanford Univ., 1984.
[27] C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, Mass., 1980.

Index Terms:
Cache memory, single-chip processor, on-chip cache, memory hierarchy, performance evaluation.
Hong Wang, Tong Sun, Qing Yang, "Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags," IEEE Transactions on Computers, vol. 46, no. 11, pp. 1187-1201, Nov. 1997, doi:10.1109/12.644293
Usage of this product signifies your acceptance of the Terms of Use.