This Article 
 Bibliographic References 
 Add to: 
Massively Parallel Algorithms for Trace-Driven Cache Simulations
August 1994 (vol. 5 no. 8)
pp. 849-859

Considers the use of massively parallel architectures to execute a trace-driven simulation of a single cache set. A method is presented for the least-recently-used (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-based line replacement policies are considered, which includes LRU as well as the least-frequently-used (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

[1] M. J. Atallah, R. Cole, and M. Goodrich, "Cascading divide-and-conquer,"SIAM J. Computing, vol. 18, pp. 499-532, June 1989.
[2] K. E. Batcher, "Sorting networks and their applications," inAFIPS 1968 Spring Joint Comput. Conf., 1968, pp. 307-314.
[3] J. L. Bentley, "Multidimensional divide and conquer,"Commun. ACM, vol. 23, pp. 214-219, 1980.
[4] T. Blank, "The MasPar MP-1 Architecture," inProc. 35th IEEE Comput. Soc. Int. Conf.--Spring Compcon 90, San Francisco, CA, Feb. 1990.
[5] R. Cole, "Parallel merge sort,"SIAM J. Comput., vol. 17, pp. 770-785, 1988.
[6] R. A. Finkel,An Operating Systems VADE MECUM. Englewood Cliffs, NJ: Prentice Hall, 1988.
[7] P. Gibbons, 1992, personal communication.
[8] A. Gottlieb, C. P. Kruskal, "Complexity results for permuting data and other computations on parallel processors,"J. ACM, vol. 31, no. 2, pp. 193-209, Apr. 1984.
[9] P. Heidelberger and H. Stone, "Parallel trace-driven cache simulation by time partitioning," in1990 Winter Simulation Conf., 1990, pp. 734-737.
[10] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[11] W. D. Hillis and G. L. Steele, Jr., "Data parallel algorithms,"Commun. ACM, vol. 29, no. 12, pp. 1170-1183, Dec. 1986.
[12] M. D. Hill and A. J. Smith, "Evaluating associativity in CPU caches,"IEEE Trans. Comput., vol. 28, pp. 1612-1630, Dec. 1989.
[13] C. P. Kruskal, L. Rudolph, and M. Snir, "The power of parallel prefix,"IEEE Trans. Comput., vol. C-34, no. 10, Oct. 1985.
[14] R. E. Ladner and M. J. Fischer, "Parallel prefix computation,"J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.
[15] F. T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Palo Alto, CA: Morgan Kaufmann, 1992.
[16] Y.-B. Lin, J.-L. Baer, and E. D. Lazowska, "Tailoring a parallel trace-driven simulation technique to specific multiprocessor cache coherence protocols," inDistrib. Simulation 1989, vol. 21, pp. 185-190, 1989.
[17] R. Mattson, J. Gecsei, D. Slutz, and I. Traiger, "Evaluation techniques for storage hierarchies,"IBM Syst. J., vol. 12, no. 2, pp. 78-117, 1970.
[18] G. Plaxton, "Load balancing, selection and sorting on the hypercube," inProc. 1st ACM Symp. Parallel Algorithms and Architectures, June 1989, pp. 64-73.
[19] T. R. Puzak, "Cache-memory design," Ph.D. Dissertation, Univ. of Massachusetts, 1985.
[20] J. H. Reif and S. Sen, "Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications," in1991 ACM Symp. Parallel Algorithms and Architectures, 1991, pp. 327-337.
[21] J.T. Robinson and M.V. Devarakonda, "Data Cache Management Using Frequency-Based Replacement,"Performance Evaluation Rev., Vol. 18, No. 1, May 1990, pp. 134-142.
[22] H. S. Stone,High Performance Computer Architecture. Reading, MA: Addison-Wesley, 1990.
[23] D. Thiébaut, H. Stone, and J. Wolf, "Synthetic traces for trace-driven simulation of cache memories,"IEEE Trans. Comput., vol. 41, pp. 388-410, Apr. 1992.
[24] W. Wang and J. Baer, "Efficient trace-driven simulation methods for cache performance analysis," inProc. Conf. Measurement and Modeling of Comput. Syst., 1990, pp. 27-36.

Index Terms:
Index Termsparallel algorithms; buffer storage; parallel architectures; computational complexity; program diagnostics; massively parallel algorithms; trace-driven cache simulations; least-recently-used policy; EREW parallel model; simulation algorithm; algorithm timings; MasPar MP-1; reference-based line replacement policies; least-frequently-used policy; random replacement policy; trace; space overhead; SIMD implementation
D.M. Nicol, A.G. Greenberg, B.D. Lubachevsky, "Massively Parallel Algorithms for Trace-Driven Cache Simulations," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 8, pp. 849-859, Aug. 1994, doi:10.1109/71.298211
Usage of this product signifies your acceptance of the Terms of Use.