This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
July 1995 (vol. 6 no. 7)
pp. 733-746

Abstract—To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware techniques is that they need no support from the programmer or compiler.

Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache, thus exploiting spatial locality. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the execution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of prefetched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively.

[1] S. Adve and M. Hill, “Weak Ordering—A New Definition,” Proc. 17th Ann. Int'l Symp. Computer Architecture, May 1990.
[2] J.-L. Baer and T.-F. Chen, "An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty," Proc. Supercomputing '91, pp. 176-186, 1991,.
[3] J. Boyle,R. Butler,T. Disz,B. Glickfeld,E. Lusk,R. Overbeek,J. Patterson,, and R. Stevens,“Portable programs for parallel processors.” Holt, Rinehart, and Winston Inc. 1987.
[4] M. Brorsson,F. Dahlgren,H. Nilsson,, and P. Stenström,“The CacheMire test bench—A flexible and effective approachfor simulation of multiprocessors,” Proc. 26th Ann. Simulation Symp., pp. 41-49, 1993.
[5] Callahan Kennedy and Porterfield, "Software Prefetching," Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 40-52, Apr. 1991.
[6] L.M. Censier and P. Feautrier,“A new solution to coherence problems in multicache systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1,112-1,118, Dec. 1978.
[7] T.-F. Chen and J.-L. Baer, "A Performance Study of Software and Hardware Data Prefetching Schemes," Proc. 21st Int'l Symp. Computer Architecture, pp. 223-232, 1994.
[8] F. Dahlgren,M. Dubois,, and P. Stenström,“Fixed and adaptive sequential prefetching in shared-memory multiprocessors,” Proc. 22nd Int’l Conf. Parallel Processing, vol. I, pp. 56-63, Aug. 1993.
[9] F. Dahlgren, M. Dubois, and P. Stenström, "Combined Performance Gains of Simple Cache Protocol Extensions," Proc. 21st Int'l Symp. Computer Architecture, pp.187-197, 1994,.
[10] F. Dahlgren and P. Stenström, "Effectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," Proc. First Int'l Symp. High-Performance Computer Architecture, pp. 68-77, 1995.
[11] M. Dubois and C. Scheurich, "Memory Access Dependencies in Shared-Memory Multiprocessors," IEEE Trans. Computers, vol. 16, no. 6, pp. 660-673, June 1990.
[12] M. Dubois, J. Skeppstedt, L. Ricciulli et al., , "The Detection and Elimination of Useless Misses in Multiprocessors," Proc. 20th Int'l Symp. Computer Architecture, pp. 88-97, May 1993.
[13] S. Eggers and R. Katz, “The Effect of Sharing on the Cache and Bus Performance of Parallel Programs,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 257-270, Apr. 1989.
[14] J. Fu and J.H. Patel, "Data Prefetching in Multiprocessor Vector Cache Memories," Proc. 18th Int'l Symp. Computer Architecture, pp. 54-63, 1991.
[15] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,” Proc. 17th Ann. Int'l Symp. Computer Architecture, 1990.
[16] K. Gharachorloo, A. Gupta, and J. Hennessy, "Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors," Proc. ASPLOS IV, pp. 245-257, 1991.
[17] A. Gupta and W.-D. Weber, "Cache Invalidation Patterns in Shared-Memory Multiprocessors," IEEE Trans. Computers, vol. 41, no. 7, pp. 794-810, July 1992.
[18] E. Hagersten,“Towards scalable cache only memory architectures.” PhD thesis, Swedish Inst. of Computer Science, Oct. 1992 (SICS Dissertation Series 08).
[19] L. Lamport,“How to make a multiprocessor computer that correctly executes multiprocess programs,” IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[20] R. Lee,P.-C. Yew,, and D. Lawrie,“Data prefetching in shared-memory multiprocessors,” Proc. 1987 Int’l Conf. Parallel Processing, pp. 28-31, Aug. 1987.
[21] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 63-79, Mar. 1992.
[22] T. Mowry and A. Gupta, "Tolerating Latency through Software-Controlled Prefetching in Scalable Shared- Memory Multiprocessors," J. Parallel and Distributed. Computing, vol. 12, pp. 87-106, June 1991.
[23] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1992.
[24] J.P. Singh, W.D. Weber, and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared Memory," Proc. 19th Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., May 1992, pp. 5-14.
[25] A.J. Smith,“Sequential program prefetching in memory hierarchies,” Computer, vol. 11, no. 12, pp. 7-21, Dec. 1978.
[26] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[27] P. Stenström, "A Survey of Cache Coherence Scheme for Multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, Jun.e 1990.
[28] P. Stenström,F. Dahlgren,, and L. Lundberg, “A lockup-free multiprocessor cache design,” Proc. 1991 Int’l Conf. Parallel Processing, vol. I, pp. 246-250, 1991.
[29] P. Stenström, M. Brorsson, and L. Sandberg, "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 109-118, 1993.

Index Terms:
Hardware-controlled prefetching, latency tolerance, memory consistency models, performance evaluation, sequential prefetching, shared-memory multiprocessors.
Citation:
Fredrik Dahlgren, Michel Dubois, Per Stenström, "Sequential Hardware Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 7, pp. 733-746, July 1995, doi:10.1109/71.395402
Usage of this product signifies your acceptance of the Terms of Use.