|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| D.M. Nicol, A.G. Greenberg, B.D. Lubachevsky, "Massively Parallel Algorithms for Trace-Driven Cache Simulations," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 8, pp. 849-859, August, 1994. | |||
| BibTex | x | ||
| @article{ 10.1109/71.298211, author = {D.M. Nicol and A.G. Greenberg and B.D. Lubachevsky}, title = {Massively Parallel Algorithms for Trace-Driven Cache Simulations}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {5}, number = {8}, issn = {1045-9219}, year = {1994}, pages = {849-859}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.298211}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Massively Parallel Algorithms for Trace-Driven Cache Simulations IS - 8 SN - 1045-9219 SP849 EP859 EPD - 849-859 A1 - D.M. Nicol, A1 - A.G. Greenberg, A1 - B.D. Lubachevsky, PY - 1994 KW - Index Termsparallel algorithms; buffer storage; parallel architectures; computational complexity; program diagnostics; massively parallel algorithms; trace-driven cache simulations; least-recently-used policy; EREW parallel model; simulation algorithm; algorithm timings; MasPar MP-1; reference-based line replacement policies; least-frequently-used policy; random replacement policy; trace; space overhead; SIMD implementation VL - 5 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Considers the use of massively parallel architectures to execute a trace-driven simulation of a single cache set. A method is presented for the least-recently-used (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-based line replacement policies are considered, which includes LRU as well as the least-frequently-used (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
[1] M. J. Atallah, R. Cole, and M. Goodrich, "Cascading divide-and-conquer,"SIAM J. Computing, vol. 18, pp. 499-532, June 1989.
[2] K. E. Batcher, "Sorting networks and their applications," inAFIPS 1968 Spring Joint Comput. Conf., 1968, pp. 307-314.
[3] J. L. Bentley, "Multidimensional divide and conquer,"Commun. ACM, vol. 23, pp. 214-219, 1980.
[4] T. Blank, "The MasPar MP-1 Architecture," inProc. 35th IEEE Comput. Soc. Int. Conf.--Spring Compcon 90, San Francisco, CA, Feb. 1990.
[5] R. Cole, "Parallel merge sort,"SIAM J. Comput., vol. 17, pp. 770-785, 1988.
[6] R. A. Finkel,An Operating Systems VADE MECUM. Englewood Cliffs, NJ: Prentice Hall, 1988.
[7] P. Gibbons, 1992, personal communication.
[8] A. Gottlieb, C. P. Kruskal, "Complexity results for permuting data and other computations on parallel processors,"J. ACM, vol. 31, no. 2, pp. 193-209, Apr. 1984.
[9] P. Heidelberger and H. Stone, "Parallel trace-driven cache simulation by time partitioning," in1990 Winter Simulation Conf., 1990, pp. 734-737.
[10] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[11] W. D. Hillis and G. L. Steele, Jr., "Data parallel algorithms,"Commun. ACM, vol. 29, no. 12, pp. 1170-1183, Dec. 1986.
[12] M. D. Hill and A. J. Smith, "Evaluating associativity in CPU caches,"IEEE Trans. Comput., vol. 28, pp. 1612-1630, Dec. 1989.
[13] C. P. Kruskal, L. Rudolph, and M. Snir, "The power of parallel prefix,"IEEE Trans. Comput., vol. C-34, no. 10, Oct. 1985.
[14] R. E. Ladner and M. J. Fischer, "Parallel prefix computation,"J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.
[15] F. T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Palo Alto, CA: Morgan Kaufmann, 1992.
[16] Y.-B. Lin, J.-L. Baer, and E. D. Lazowska, "Tailoring a parallel trace-driven simulation technique to specific multiprocessor cache coherence protocols," inDistrib. Simulation 1989, vol. 21, pp. 185-190, 1989.
[17] R. Mattson, J. Gecsei, D. Slutz, and I. Traiger, "Evaluation techniques for storage hierarchies,"IBM Syst. J., vol. 12, no. 2, pp. 78-117, 1970.
[18] G. Plaxton, "Load balancing, selection and sorting on the hypercube," inProc. 1st ACM Symp. Parallel Algorithms and Architectures, June 1989, pp. 64-73.
[19] T. R. Puzak, "Cache-memory design," Ph.D. Dissertation, Univ. of Massachusetts, 1985.
[20] J. H. Reif and S. Sen, "Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications," in1991 ACM Symp. Parallel Algorithms and Architectures, 1991, pp. 327-337.
[21] J.T. Robinson and M.V. Devarakonda, "Data Cache Management Using Frequency-Based Replacement,"Performance Evaluation Rev., Vol. 18, No. 1, May 1990, pp. 134-142.
[22] H. S. Stone,High Performance Computer Architecture. Reading, MA: Addison-Wesley, 1990.
[23] D. Thiébaut, H. Stone, and J. Wolf, "Synthetic traces for trace-driven simulation of cache memories,"IEEE Trans. Comput., vol. 41, pp. 388-410, Apr. 1992.
[24] W. Wang and J. Baer, "Efficient trace-driven simulation methods for cache performance analysis," inProc. Conf. Measurement and Modeling of Comput. Syst., 1990, pp. 27-36.

