This Article 
 Bibliographic References 
 Add to: 
Introducing a New Cache Design into Vector Computers
December 1993 (vol. 42 no. 12)
pp. 1411-1424

Introduces an innovative cache design for vector computers, called prime-mapped cache. By utilizing the special properties of a Mersenne prime, the new design does not increase the critical path length of a processor, nor does it increase the cache access time as compared to existing cache organizations. The prime-mapped cache minimizes cache miss ratio caused by line interferences that have been shown to be critical for numerical applications by previous investigators. With negligibly additional hardware cost, significant performance gains are obtained by adding the proposed cache memory to an existing vector computer. The performance of the design is studied analytically, using a generic vector computation model. The analytical model is validated through extensive simulation experiments. A performance analysis for various vector access patterns shows that the prime-mapped cache performs significantly better than conventional cache organizations in the vector processing environment. The performance gain will increase with the increase of the speed gap between processors and memories.

[1] A. Smith, "Cache Memories,"Computing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473- 530.
[2] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[3] H. S. Stone,High Performance Computer Architecture. Reading, MA: Addison-Wesley, 1990.
[4] K. So and V. Zecca, "Cache performance of vector processors," inProc. 15th Int. Symp. Comput. Architecture, 1988, pp. 261-268.
[5] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for cache and local memory management by global program transformation," inProc. 1st Inc. Conf. Supercomputing, Athens, Greece, June 1987.
[6] J.J. Dongarra et al., "A Set of Level 3 Basic Linear Algebra Subprograms,"ACM Trans. Math. Software, Vol. 16, No. 1, 1990, pp. 1-17.
[7] M. S. Lam, E. E. Rothberg, and M. E. Wolf, "The cache performance and optimizations of blocked algorithms," inProc. Architecture Supp. for Program Lang. and Operating Syst., Apr. 1991, pp. 63-74.
[8] J. W. C. Fu and J. H. Patel, "Data prefetching in multiprocessor vector cache memories," inProc. 18th Int. Symp. on Comput. Architecture, 1991, p. 54-63.
[9] D. H. Bailey, "Vector computer memory bank contention,"IEEE Trans. Computers, vol. C-36, pp. 293-298, Mar. 1987.
[10] W. Abu-Sufah and A. D. Malony, "Vector processing on the Alliant FX/8 multiprocessor," inProc. Int. Conf. Parallel Processing, Aug. 1986, pp. 559-566.
[11] D. Bhandarkar and R. Brunner, "VAX vector architecture," inProc. 17th Int. Symp. Comput. Architecture, 1990, pp. 204-215.
[12] A. J. Pettofrezzo and D. R. Byrkit,Elements of Number Theory. Englewood Cliffs, NJ: Prentice-Hall, 1970.
[13] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An evaluation of directory schemes for cache coherence," inProc. 15th Int. Symp. Comput. Architecture, June 1988, pp. 280-289.
[14] J. Archibald and J. L. Baer, "Cache-coherence protocols: Evaluation using a multiprocessor simulation model,"ACM Trans. Comput. Syst., vol. 4, no. 4, pp. 273-298, Nov. 1986.
[15] Q. Yang, L. Bhuyan, and B.-C. Liu, "Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor,"IEEE Trans. Comput.(Special Issue on Distributed Computer Systems), vol. 38, pp. 1143-1153, Aug. 1989.
[16] M. Dubois and F. Briggs, "Effect of cache coherency in multiprocessors,"IEEE Trans. Comput., vol. C-31, pp. 1083-1099, Nov. 1982.
[17] M. D. Hill, "A case for direct-mapped caches,"IEEE Comput., pp. 25-40, Dec. 1988.
[18] P. Budnik and D. J. Kuck, "Organization and use of parallel memories,"IEEE Trans. Comput., vol. C-20, pp. 1566-1569, Dec. 1971.
[19] D. H. Lawrie and C. R. Vora, "The prime memory system for array access,"IEEE Trans. Comput., vol. C-31, pp. 435-441, May 1982.
[20] J. Armstrong, "Algorithm and performance notes for blocked LU factorization," inProc. Int. Conf. Parallel Processing, Aug. 1988, pp. III-161-164.
[21] J. W. Cooley, "The structure of FFT and convolution algorithms," IBM T. J. Watson Research Center Research Report, 1990.
[22] D. T. Harper III, "Block, multistride vector, and FFT accesses in parallel memory systems,"IEEE Trans. Parallel Distributed Syst., vol. 2, pp. 43-51, Jan. 1991.
[23] W. Oed and O. Lange, "On the effective bandwidth of interleaved memories in vector processing systems,"IEEE Trans. Comput., vol. C-34, no. 10, pp. 949-957, Oct. 1985.
[24] R. Raghavan and J. P. Hayes, "On randomly interleaved memories," inProc. Supercomputing '90 Conf.(New York), Nov. 1990.
[25] Q. Yang and L. W. Yang, "A novel cache design for vector processing," presented at the 19th Int. Symp. Computer Architecture, Gold Coast, Australia, May 1992.

Index Terms:
buffer storage; memory architecture; vector processor systems; cache design; vector computers; prime-mapped cache; Mersenne prime; cache miss ratio; performance gains; speed gap; cache organizations.
Quing Yang, "Introducing a New Cache Design into Vector Computers," IEEE Transactions on Computers, vol. 42, no. 12, pp. 1411-1424, Dec. 1993, doi:10.1109/12.260632
Usage of this product signifies your acceptance of the Terms of Use.