This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Comparative Analysis of Cache Designs for Vector Processing
March 1999 (vol. 48 no. 3)
pp. 331-344

Abstract—This paper presents an experimental study on cache memory designs for vector computers. We use an execution-driven simulator to evaluate vector cache performance of a set of application programs from Perfect Club and SPEC92 benchmark suites. Our simulation results uncover a few important facts which were unknown before: First of all, the prime-mapped cache that we newly proposed shows great performance potential in vector processing environment. Because of its conflict-free property, the prime-mapped cache performs significantly better than conventional cache designs for all applications considered. Second, performance results on the benchmarks indicate that data locality in vector processing does exist, although the effects of line size, associativity, replacement algorithm, and prefetching scheme on cache performance are very different from what has been commonly believed. A medium size vector cache (e.g., 128Kbytes) eliminates the necessity for a large number of interleaved memory banks in vector computers. Our experiments show that the vector computer that has a medium size prime-mapped cache with small cache line size and limited amount of prefetching provides significant speedup over conventional vector computers without cache. Performance results reported in this paper can also provide guidance to general-purpose computer designers to enhance cache performance for numerical applications.

[1] D.H. Bailey,“Vector computer memory bank contention,” IEEE Trans. Computers, vol. 36, pp. 293-298, 1987.
[2] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformation," Proc. Int'l Conf. Supercomputing, 1987.
[3] K. So and V. Zecca, "Cache Performance of Vector Processors," Proc. 15th Int'l Symp. Computer Architecture, pp. 261-268, 1988.
[4] W. Abu-Sufah and A.D. Malony, "Vector Processing on the Alliant FX/8 Multiprocessor," Proc. Int'l Conf. Parallel Processing, pp. 559-566, Aug. 1986.
[5] M.D. Hill, "A Case for Direct-Mapped Caches," Computer, pp. 25-40, Dec. 1988.
[6] M. Berry et al., "The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers," Int'l J. Supercomputer Applications, Fall 1989.
[7] M.D. Hill, "Dinerao Cache Simulator," Univ. of Wisconsin, 1985, 1989.
[8] D. Bhandarkar and R. Brunner, "VAX Vector Architecture," Proc. 17th Int'l Symp. Computer Architecture, pp. 204-215, 1990.
[9] Q. Yang and S. Adina, "A One's Complement Cache," Proc. '94 Int'l Conf. Parallel Processing, pp. 250-258, Aug. 1994.
[10] J. Fu and J.H. Patel, "Data Prefetching in Multiprocessor Vector Cache Memories," Proc. 18th Int'l Symp. Computer Architecture, pp. 54-63, 1991.
[11] D.T. Harper III,“Block, multistride vector and FFT accesses in parallel memorysystems,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 1, pp. 43-51, 1991.
[12] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[13] I.Y. Bucher and Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processors," Proc. Supercomputing '91, pp. 806-817, 1991.
[14] J.D. Gee, M.D. Hill, D.N. Pnevmatikatos, and A.J. Smith, "Cache Performance of the SPEC92 Benchmark Suite," IEEE Micro, pp. 17-27, Aug. 1993.
[15] Q. Yang and L.W. Yang, "A Novel Cache Design for Vector Processing," Int'l Symp. Computer Architecture, ACM Press, New York, 1992.
[16] Q. Yang, "Introducing a New Cache Design into Vector Computers," IEEE Trans. Computers, vol. 432 no. 12, pp. 1,411-1,424, Dec. 1993.
[17] T. Sun and Q. Yang, "Performance of SPEC92 on Prime-Mapped Vector Cache," Proc. Sixth IEEE Symp. Parallel and Distributed Processing,Dallas, Oct. 1994.

Index Terms:
Performance evaluation, cache memories, memory hierarchy, vector processing, simulation, benchmarks.
Citation:
Tong Sun, Qing Yang, "A Comparative Analysis of Cache Designs for Vector Processing," IEEE Transactions on Computers, vol. 48, no. 3, pp. 331-344, March 1999, doi:10.1109/12.754999
Usage of this product signifies your acceptance of the Terms of Use.