This Article 
 Bibliographic References 
 Add to: 
An Intelligent Cache System with Hardware Prefetching for High Performance
May 2003 (vol. 52 no. 5)
pp. 607-616

Abstract—In this paper, we present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. The proposed cache, which we call a Selective-Mode Intelligent (SMI) cache, consists of three parts: a direct-mapped cache with a small block size, a fully associative spatial buffer with a large block size, and a hardware prefetching unit. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer for a time period. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. The overhead of this prefetching operation is shown to be negligible. We also show that the prefetch operation is highly accurate: Over 90 percent of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance. Also, the SMI cache can reduce the miss ratio by around 20 percent and the average memory access time by 10 percent, compared with a victim-buffer cache configuration.

[1] J.L. Baer and T.F. Chen, “An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty,” Proc. Int'l Conf. Supercomputing'91, pp. 176-186, 1991.
[2] T. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 62-73, 1992.
[3] W.Y. Chen, S.A. Mahlke, P.P. Chang, and W.M. Hwu, “Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching,” Proc. 24th Ann. Workshop Microprogramming and Microarchitectures, 1991.
[4] W.Y. Chen, R.A. Bringmann, S.A. Mahlke, R.E. Hank, and J.E. Sicolo, “An Efficient Architecture for Loop Based Data Preloading,” Proc. 25th Int'l Symp. Microarchitecture, pp. 92-101, 1992.
[5] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[6] D. Stiliadis and A. Varma, “Selective Victim Caching: A Method to Improve the Performance of Direct Mapped Cache,” IEEE Trans. Computers, vol. 46, no. 5, pp. 603-610, May 1997.
[7] A. Gonzalez, C. Aliagas, and M. Mateo, “Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality,” Proc. Int'l Conf. Supercomputing '95, pp. 338-347, July 1995.
[8] V. Milutinovic, M. Tomasevic, B. Markovic, and M. Tremblay, “The Split Temporal/Spatial Cache: Initial Performance Analysis,” Proc. Fifth SCIzzL Conf., Mar. 1996.
[9] G. Kurpanchek, et al., “PA-7200: A PA-RISC Processor with Integrated High Performance MP Bus Interface,” COMPCON Digest of Papers, pp. 375-382, Feb. 1994.
[10] G.H. Park, K.W. Lee, J.H. Lee, T.D. Han, and S.D. Kim, “A Power Efficient Cache Structure for Embedded Processors Based on the Dual Cache Structure,” Proc. ACM Workshop Languages, Compilers, and Tools for Embedded Systems, June 2000.
[11] J.H. Lee, J.S. Lee, and S.D. Kim, “A New Cache Architecture Based on Temporal and Spatial Locality,” J. Systems Architecture, vol. 46, pp. 1451-1467, Sept. 2000.
[12] S. Przybylski, “The Performance Impact of Block Sizes and Fetch Strategies,” Proc. 17th Ann. Int'l Symp. Computer Architecture, pp. 160-169, May 1990.
[13] F.J. Sanchez, A. Gonzalez, and M. Valeo, “Static Locality Analysis for Cache Management,” Proc. Conf. Parallel Architectures and Compilation Techniques (PACT '97), pp. 261-271, Nov. 1997.
[14] T. Ball and J.R. Larus, “Optimally Profiling and Tracing Programs,” ACM Trans. Programming Languages and Systems, vol. 16, no. 4, pp. 1319-1360, July 1994.
[15] G. Albera and R.I. Bahar, “Power/Performance Advantages of Victim Buffer in High-Performance Processors,” Proc. IEEE Alessandro Volta Memorial Workshop, pp. 43-51, Mar. 1999.
[16] V. Srinivasan, “Improving Performance of an L1 Cache With an Associated Buffer,” Technical Report CSE-TR-361-98, Univ. of Michigan, Feb. 1998.
[17] J.M. Mulder, N.T. Quach, and M.J. Flynn, “An Area Model for On-Chip Memories and its Applications,” IEEE J. Solid State Circuits, vol. 26, no. 2, pp. 98-106, Feb. 1991.

Index Terms:
Memory hierarchy, dual data cache, temporal locality, spatial locality, prefetching.
Jung-Hoon Lee, Seh-woong Jeong, Shin-Dug Kim, Charles Weems, "An Intelligent Cache System with Hardware Prefetching for High Performance," IEEE Transactions on Computers, vol. 52, no. 5, pp. 607-616, May 2003, doi:10.1109/TC.2003.1197127
Usage of this product signifies your acceptance of the Terms of Use.