This Article 
 Bibliographic References 
 Add to: 
An Analysis of Cache Performance for a Hypercube Multicomputer
July 1992 (vol. 3 no. 4)
pp. 421-432
Multicomputer cache simulation results derived from address traces collected from an Intel iPSC/2 hypercube multicomponent are presented. The primary emphasis is on examining how increasing the number of processor nodes executing a parallel application affects the overall multicomputer cache performance. The effects on multicomputer direct-mapped cache performance of application-specific data partitioning, data access patterns, communication distribution, and communication frequency are illustrated. The effects of system accesses on total cache performance are explored, as well as the reasons for application-specific differences in cache behavior for system and user accesses. Comparing user code results with full user and system code analysis reveals the significant effect of system accesses, and this effect increases with multicomputer size. The time distribution of an application's message-passing operations is found to more strongly affect cache performance than the total amount of time spent in message-passing code.

[1] A. Smith, "Cache Memories,"Computing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473- 530.
[2] P. Heidelberger and S. S. Lavenberg, "Computer performance evaluation methodology,"IEEE Trans. Comput., vol. C-33, pp. 1195-1220, Dec. 1984.
[3] C.B. Stunkel and W.K. Fuchs, "TRAPEDS: Producing Traces for Multicomputers Via Execution-Driven Simulation,"Proc. ACM SIGMetrics Int'l Conf. Measurement and Modeling of Computer Systems, 1989, pp. 70-78.
[4] C. B. Stunkel, "TRAPEDS address tracing and its application to multicomputer cache performance analysis," Ph.D. dissertation, Dep. Elec. and Comput. Eng., Univ. of Illinois, Urbana, IL, Mar. 1990. Also available as Tech. Rep. CRHC-91-23 from Center for Reliable and High-Performance Computing, Coordinated Science Lab, Univ. of Illinois.
[5] R.L. Sites and A. Agarwal, "Multiprocessor Cache Analysis Using ATUM,"Proc. 15th Int'l Symp. Computer Architecture, 1988, IEEE CS Press, Los Alamitos, Calif. Order No. 861, pp. 186-195.
[6] S. J. Eggers and R. H. Katz, "The effect of sharing on the cache and bus performance of parallel programs," inProc. 3rd Int. Conf. Architectural Support Programming Languages Oper. Syst., Boston, MA, Apr. 1989, pp. 257-270.
[7] S.J. Eggers and R.H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation,"Proc. 15th Int'l Symp. Computer Architecture, 1988, IEEE CS Press, Los Alamitos, Calif. Order No. 861, pp. 373-382.
[8] C. B. Stunkel and W. K. Fuchs, "Analysis of hypercube cache performance using address traces generated by TRAPEDS," inProc. 1989 Int. Conf. Parallel Processing, vol. I, St. Charles, IL, Aug. 1989, pp. 33-40.
[9] R. J. Brouwer and P. Banerjee, "A parallel simulated annealing algorithm for channel routing on a hypercube multiprocessor," inProc. IEEE Int. Conf. Comput. Design, Rye Brook, NY, Oct. 1988, pp. 4-7.
[10] S. Patil and P. Banerjee, "A parallel branch and bound approach to test generation,"IEEE Trans. Comput.-Aided Design of Circuits and Syst., vol. 9, pp. 313-322, Mar. 1990.
[11] K. P. Belkhale and P. Banerjee, "PACE: A parallel VLSI circuit extractor on the Intel hypercube multiprocessor," inProc. ICCAD-88, Santa Clara, CA, Nov. 1988, pp. 326-329.
[12] K. P. Belkhale and P. Banerjee, "PACE2: An improved parallel VLSI extractor with parametric extraction," inProc. Int. Conf. Comput.-Aided Design, Santa Clara, CA, Nov. 1989, pp. 526-530.
[13] M. Foxet al., Solving Problems on Concurrent Processors, vol. 1. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[14] C. B. Stunkel, "Linear optimization via message-based parallel processing," inProc. 1988 Int. Conf. Parallel Processing, vol. III, St. Charles, IL, Aug. 1988, pp. 264-271.
[15] P. J. Denning, "Virtual memory,"ACM Comput. Surveys, vol. 2, pp. 153-189, 1970.
[16] J. L. Gustafson, "Re-evaluating Amdahl's Law,"Commun. ACM, vol. 31, no. 5, pp. 532-533, 1988.
[17] G. Strang,Linear Algebra and Its Applications. New York: Academic, 1980.
[18] A. Agarwal, J. Hennessy, and M. Horowitz, "Cache performance of operating systems and multiprogramming workloads,"ACM Trans. Comput. Syst., vol. 6, pp. 393-431, Nov. 1988.

Index Terms:
Index Termshypercube multicomputer; cache simulation; address traces; Intel iPSC/2; processornodes; parallel application; direct-mapped cache performance; application-specific datapartitioning; data access patterns; communication distribution; communication frequency;system accesses; user code; code analysis; time distribution; message-passing code;buffer storage; hypercube networks; parallel programming; performance evaluation;storage management
C.B. Stunkel, W.K. Fuchs, "An Analysis of Cache Performance for a Hypercube Multicomputer," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 4, pp. 421-432, July 1992, doi:10.1109/71.149961
Usage of this product signifies your acceptance of the Terms of Use.