This Article 
 Bibliographic References 
 Add to: 
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
September 1995 (vol. 6 no. 9)
pp. 915-929

Abstract—We describe a simulator which emulates the activity of a shared memory, common bus multiprocessor system with private caches. Both kernel and user program activities are considered, thus allowing an accurate analysis and evaluation of coherence protocol performance. The simulator can generate synthetic traces, based on a wide set of input parameters which specify processor, kernel and workload features. Other parameters allow us to detail the multiprocessor architecture for which the analysis has to be carried out. An actual-trace-driven simulation is possible, too, in order to evaluate the performance of a specific multiprocessor with respect to a given workload, if traces concerning this workload are available. In a separate section, we describe how actual traces can also be used to extract a set of input parameters for synthetic trace generation. Finally, we show how the simulator may be successfully employed to carry out a detailed performance analysis of a specific coherence protocol.

[1] A. Agarwal,Analysis of Cache Performance for Operating Systems and Multiprogramming.Boston: Kluwer Academic Publishers, 1988.
[2] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[3] P.J. Denning,“The working set model for program behavior,” Comm. ACM, vol. 11, pp. 323-333, May 1968.
[4] M. Dubois and F.A. Briggs,“Effects of cache coherency in multiprocessors,” IEEE Trans. Computers, vol. 31, no. 11, pp. 1,083-1,099, Nov. 1982.
[5] M.C. Easton,“Computation of cold-start miss ratios,” IEEE Trans. Computers, vol. 27, no. 5, pp. 404-408, May 1978.
[6] M.C. Easton and R. Fagin,“Cold-start versus warm-start miss ratios,” Comm. ACM, vol. 21, no. 10, pp. 866-872, Oct. 1978.
[7] S.J. Eggers, "Simulation Analysis of Data Sharing in Shared Memory Multiprocessors," PhD dissertation, UCB/CSD 89/501, Computer Science Dept., Univ. of California, Berkeley, Calif., 1989.
[8] “Multimax technical summary,” Tech. Report, Encore Computer Corp., Marlboro, Mass., 1987.
[9] G. Gibson,“Estimating performance of single bus, shared memory multiprocessors,” Univ. of California, Berkeley, May 1987.
[10] J.R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic," Proc. 10th Ann. Symp. Computer Architecture, pp. 124-132, 1983.
[11] IEEE Std. 982.1-1988, Dictionary of Measures to Produce Reliable Software, IEEE Press, 1988.
[12] P. Heidelberger and S.S. Lavenberg,“Computer performance evaluation methodology,” IEEE Trans. Computers, vol. 33, pp. 1,195-1,220, Dec. 1984.
[13] R.H. Katz et al., "Implementing a Cache Consistency Protocol," Proc. 12th Ann. Int'l Symp. Computer Architecture, June 1985, pp. 158-166.
[14] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance, Prentice Hall, pp 64-66, 1984.
[15] E.M. McCreight,“The dragon computer system: An early overview,” NATO Advanced Study Institute on Microarchitecture of VLSI Computer,Urbino, Italy, July 1985.
[16] C.A. Prete,“A new solution of coherence protocol for tightly coupled multiprocessor systems,” Microprocessing and Microprogramming, vol. 30, no. 1-5, pp. 207-214, 1990.
[17] C.A. Prete, “RST Cache Memory Design for a Tightly Coupled Multiprocessor System,” IEEE Micro, vol. 11, no. 2, pp. 16-19, 40-52, Apr. 1991.
[18] C.A. Prete,“A process cache memory for tightly coupled multiprocessor systems,” Proc. 30th ACM Ann. Southeast Conf., pp. 131-138, 1992.
[19] C.A. Prete,G. Prina,, and L. Ricciardi,“Reducing coherence-related overhead in multiprocessor systems,” Proc. Third Euromicro Workshop,San Remo, Italy, pp. 444-451, Jan. 1995.Los Alamitos, Calif.: IEEE CS Press.
[20] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[21] J.R. Spirn,Program Behavior: Models and Measurement, Operating and programming systems series. New York: Elsevier, 1976.
[22] M.S. Squillante and E.D. Lazowska, "Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling," IEEE Trans. Parallel and Distributed Systems, Vol. 4, No. 2, Feb. 1993, pp. 131-143.
[23] C. Thacker, L. Stewart, and E. Satterthwaite, “Firefly: A Multiprocessor Workstation,” IEEE Trans. Computers, vol. 37, no. 8, pp. 909-920, Aug. 1988.
[24] D. Thiébaut, "On the Fractal Dimension of Computer Programs and Its Application to the Prediction of the Cache Miss Ratio," IEEE Trans. Computers, vol. 38, no. 7, July 1989.
[25] D. Thiebaut, "Synthetic Traces for Trace-Driven Simulation of Cache Memories," IEEE Trans. Computers, vol. 41, no. 4, pp. 388-410, Apr. 1992.
[26] M. Tomasevic and V. Milutinovic, Tutorial on the Cache Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions, IEEE Computer Society Press, Los Alamitos, Calif., 1993.
[27] B. Vashaw,“Address trace collection and trace-driven simulation of bus based, shared memory multiprocessors,” Research Report, Dept. of Elec. and Comp. Eng., Carnegie Mellon Univ., Pittsburgh, Pa., Mar. 1993.
[28] M.K. Vernon, E.D. Lazowska, and J. Zahorjan, “An Accurate and Efficient Performance Analysis Technique for Multiprocessor Snooping Cache-Consistency Protocols,” Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 308–315, May 1988.
[29] Q. Yang, L. Bhuyan, and B.-C. Liu, "Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor," IEEE Trans. Computers., vol. 38, no. 8, pp. 1,143-1,153, Aug. 1989.

Index Terms:
Cache memory, multiple cache consistency, coherence protocol, multiprocessor, performance analysis, trace-driven simulation.
Cosimo Antonio Prete, Gianpaolo Prina, Luigi Ricciardi, "A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 9, pp. 915-929, Sept. 1995, doi:10.1109/71.466630
Usage of this product signifies your acceptance of the Terms of Use.