|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Nigel Topham, Antonio González, "Randomized Cache Placement for Eliminating Conflicts," IEEE Transactions on Computers, vol. 48, no. 2, pp. 185-192, February, 1999. | |||
| BibTex | x | ||
| @article{ 10.1109/12.752660, author = {Nigel Topham and Antonio González}, title = {Randomized Cache Placement for Eliminating Conflicts}, journal ={IEEE Transactions on Computers}, volume = {48}, number = {2}, issn = {0018-9340}, year = {1999}, pages = {185-192}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.752660}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Randomized Cache Placement for Eliminating Conflicts IS - 2 SN - 0018-9340 SP185 EP192 EPD - 185-192 A1 - Nigel Topham, A1 - Antonio González, PY - 1999 KW - Conflict avoidance KW - cache architectures KW - performance evaluation. VL - 48 JA - IEEE Transactions on Computers ER - | |||
Abstract—Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficient parallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of Instructions committed Per Cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.
[1] Semiconductor Industry Assoc., “The National Technology Roadmap for Semiconductors,” 1994.
[2] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[3] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[4] S. Ghosh, M. Martonosi, and S. Malik, "Cache Miss Equations: An Analytical Representation of Cache Misses," Proc. Int'l Conf. Supercomputing (ICS 97), IEEE Computer Soc. Press, Los Alamitos, Calif., 1997, pp. 317-324.
[5] A. Srivastava and A. Eustace, "ATOM: A System for Building Customized Program Analysis Tools," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, ACM Press, New York, 1994.
[6] A. Agarwal and S.D. Pudar, "Column-Associative Caches: a Technique for Reducing the Miss Rate of Direct-Mapped Caches," Proc. 20th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, 1993, pp. 179-190.
[7] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[8] A. Seznec, “A Case for Two-Way Skewed Associative Caches,” Proc. Int'l Symp. on Computer Architecture, pp. 169-173, 1993.
[9] A. Seznec, F. Bodin, “Skewed-Associative Caches,” Proc. Int'l Conf. Parallel Architectures and Languages (PARLE), pp. 305-316, 1993.
[10] A. Gonzalez, M. Valero, N. Topham,, and J. Parcerisa, ``Eliminating Cache Conflict Misses Through XOR-Based Placement Functions,'' Proc. 11th Int'l Conf. Supercomputing, pp. 76-83, 1997.
[11] B. Bershad, D. Lee, T. Romer,, and J. Chen, ``Avoiding Conflict Misses Dynamically in Large Direct-Mapped Caches,'' Proc. Sixth ASPLOS, pp. 158-170, Oct. 1994.
[12] D. Lawrie and C. Vora, “The Prime Memory System for Array Access,” IEEE Trans. Computers, vol. 31, no. 5, pp. 435-442, May 1982.
[13] D.T. Harper and J.R. Jump,“Vector access performance in parallel memoriesusing a skewed storage scheme,” IEEE Trans. Computers, vol. 36, pp. 1440-1449, 1987.
[14] G. Sohi, “Logical Data Skewing Schemes for Interleaved Memories in Vector Processors,” Technical Report 753, Univ. of Wisconsin-Madison, 1988.
[15] J. Frailong, W. Jalby,, and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in Parallel Memories,” Proc. Int'l Conf. Parallel Processing, pp. 276-283, 1985.
[16] R. Raghavan and J.P. Hayes, "On Randomly Interleaved Memories," Proc. Supercomputing '90, pp. 49-58, Nov. 1990.
[17] B. Rau, M. Schlansker,, and D. Yen, “The Cydra 5 Stride-Insensitive Memory System,” Proc. Int'l Conf. Parallel Processing, pp. 242-246, 1989.
[18] B.R. Rau,“Pseudo-randomly interleaved memory,” Int’l Symp. Computer Architecture, pp. 74-83, 1991.
[19] IBM, IBM 3033 Processor Complex: Theory of Operations Manual—Processor Storage Control Function, vol. 4, 1978.
[20] Amdahl Corp., 470V/6 Machine Reference Manual, 1976.
[21] D.A. Fotland, et al., “Hardware Design of the First HP Precision Architecture Computers,” Hewlett-Packard J., vol. 38, pp. 4-17, Mar. 1987.
[22] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[23] A. Agarwal,Analysis of Cache Performance for Operating Systems and Multiprogramming.Boston: Kluwer Academic Publishers, 1988.
[24] N. Topham, A. González, and J. González, “The Design and Performance of a Conflict-Avoiding Cache,” Proc. 30th Ann. Int'l Symp. Microarchitecture, pp. 71-80, Dec. 1997.
[25] K. Olukotun et al., "The Case for a Single-Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 1996, pp. 2-11.
[26] W.-H. Wang, J.-L. Baer, and H.M. Levy, “Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy,” Proc. 16th Ann. Int'l Symp. Computer Architecture (ISCA '89), pp. 140-148, June 1989.
[27] J. González and A. González, “Speculative Execution via Address Prediction and Data Prefetching,” Proc. Int'l Conf. Supercomputing, pp. 196-203, 1997.
[28] M. Golden and T. Mudge, “Hardware Support for Hiding Cache Latency,” Technical Report CSE-TR-152-93, Univ. of Michigan, 1993.
[29] E.C. Hall, "Journey to the Moon: The History of the Apollo Guidance Computer" American Institute of Aeronautics and Astronautics, Inc., Reston, VA, 1996.
[30] T.M. Austin and G.S. Sohi, “Zero-Cycle Loads: Microarchitecture Support for Reducing Load Latency,” Proc. 28th Int'l Symp. Microarchitecture, pp. 82-92, Nov. 1995.
[31] J. González and A. González, “Memory Address Prediction for Data Speculations,” Proc. EUROPAR '97 Conf., pp. 1084-1091, Aug. 1997.
[32] Y. Sazeides, S. Vassiliadis, and J.E. Smith, The Performance Potential of Data Dependence Speculation&Collapsing Proc. 29th Int'l Symp. Microarchitecture, Dec. 1996.
[33] D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. Eighth Int'l Symp. Computer Architecture, pp. 81-87, 1981.
[34] M. Franklin and G.S. Sohi, "ARB: A Hardware Mechanism for Dynamic Reordering of Memory References," IEEE Trans. Computers, May 1996, pp. 552-571.
[35] D. Hunt, “Advanced Performance Features of the 64-bit PA-8000,” Proc. COMPCON, pp. 123-128, 1995.

