
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D.M. Nicol, A.G. Greenberg, B.D. Lubachevsky, "Massively Parallel Algorithms for TraceDriven Cache Simulations," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 8, pp. 849859, August, 1994.  
BibTex  x  
@article{ 10.1109/71.298211, author = {D.M. Nicol and A.G. Greenberg and B.D. Lubachevsky}, title = {Massively Parallel Algorithms for TraceDriven Cache Simulations}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {5}, number = {8}, issn = {10459219}, year = {1994}, pages = {849859}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.298211}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Massively Parallel Algorithms for TraceDriven Cache Simulations IS  8 SN  10459219 SP849 EP859 EPD  849859 A1  D.M. Nicol, A1  A.G. Greenberg, A1  B.D. Lubachevsky, PY  1994 KW  Index Termsparallel algorithms; buffer storage; parallel architectures; computational complexity; program diagnostics; massively parallel algorithms; tracedriven cache simulations; leastrecentlyused policy; EREW parallel model; simulation algorithm; algorithm timings; MasPar MP1; referencebased line replacement policies; leastfrequentlyused policy; random replacement policy; trace; space overhead; SIMD implementation VL  5 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Considers the use of massively parallel architectures to execute a tracedriven simulation of a single cache set. A method is presented for the leastrecentlyused (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP1, a machine with 16384 processors. A broad class of referencebased line replacement policies are considered, which includes LRU as well as the leastfrequentlyused (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
[1] M. J. Atallah, R. Cole, and M. Goodrich, "Cascading divideandconquer,"SIAM J. Computing, vol. 18, pp. 499532, June 1989.
[2] K. E. Batcher, "Sorting networks and their applications," inAFIPS 1968 Spring Joint Comput. Conf., 1968, pp. 307314.
[3] J. L. Bentley, "Multidimensional divide and conquer,"Commun. ACM, vol. 23, pp. 214219, 1980.
[4] T. Blank, "The MasPar MP1 Architecture," inProc. 35th IEEE Comput. Soc. Int. Conf.Spring Compcon 90, San Francisco, CA, Feb. 1990.
[5] R. Cole, "Parallel merge sort,"SIAM J. Comput., vol. 17, pp. 770785, 1988.
[6] R. A. Finkel,An Operating Systems VADE MECUM. Englewood Cliffs, NJ: Prentice Hall, 1988.
[7] P. Gibbons, 1992, personal communication.
[8] A. Gottlieb, C. P. Kruskal, "Complexity results for permuting data and other computations on parallel processors,"J. ACM, vol. 31, no. 2, pp. 193209, Apr. 1984.
[9] P. Heidelberger and H. Stone, "Parallel tracedriven cache simulation by time partitioning," in1990 Winter Simulation Conf., 1990, pp. 734737.
[10] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[11] W. D. Hillis and G. L. Steele, Jr., "Data parallel algorithms,"Commun. ACM, vol. 29, no. 12, pp. 11701183, Dec. 1986.
[12] M. D. Hill and A. J. Smith, "Evaluating associativity in CPU caches,"IEEE Trans. Comput., vol. 28, pp. 16121630, Dec. 1989.
[13] C. P. Kruskal, L. Rudolph, and M. Snir, "The power of parallel prefix,"IEEE Trans. Comput., vol. C34, no. 10, Oct. 1985.
[14] R. E. Ladner and M. J. Fischer, "Parallel prefix computation,"J. ACM, vol. 27, no. 4, pp. 831838, Oct. 1980.
[15] F. T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, and Hypercubes. Palo Alto, CA: Morgan Kaufmann, 1992.
[16] Y.B. Lin, J.L. Baer, and E. D. Lazowska, "Tailoring a parallel tracedriven simulation technique to specific multiprocessor cache coherence protocols," inDistrib. Simulation 1989, vol. 21, pp. 185190, 1989.
[17] R. Mattson, J. Gecsei, D. Slutz, and I. Traiger, "Evaluation techniques for storage hierarchies,"IBM Syst. J., vol. 12, no. 2, pp. 78117, 1970.
[18] G. Plaxton, "Load balancing, selection and sorting on the hypercube," inProc. 1st ACM Symp. Parallel Algorithms and Architectures, June 1989, pp. 6473.
[19] T. R. Puzak, "Cachememory design," Ph.D. Dissertation, Univ. of Massachusetts, 1985.
[20] J. H. Reif and S. Sen, "Randomized algorithms for binary search and load balancing on fixed connection networks with geometric applications," in1991 ACM Symp. Parallel Algorithms and Architectures, 1991, pp. 327337.
[21] J.T. Robinson and M.V. Devarakonda, "Data Cache Management Using FrequencyBased Replacement,"Performance Evaluation Rev., Vol. 18, No. 1, May 1990, pp. 134142.
[22] H. S. Stone,High Performance Computer Architecture. Reading, MA: AddisonWesley, 1990.
[23] D. Thiébaut, H. Stone, and J. Wolf, "Synthetic traces for tracedriven simulation of cache memories,"IEEE Trans. Comput., vol. 41, pp. 388410, Apr. 1992.
[24] W. Wang and J. Baer, "Efficient tracedriven simulation methods for cache performance analysis," inProc. Conf. Measurement and Modeling of Comput. Syst., 1990, pp. 2736.