|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Daniel J. Sorin, Jonathan L. Lemon, Derek L. Eager, Mary K. Vernon, "Analytic Evaluation of Shared-Memory Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 2, pp. 166-180, February, 2003. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2003.1178880, author = {Daniel J. Sorin and Jonathan L. Lemon and Derek L. Eager and Mary K. Vernon}, title = {Analytic Evaluation of Shared-Memory Architectures}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {14}, number = {2}, issn = {1045-9219}, year = {2003}, pages = {166-180}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2003.1178880}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Analytic Evaluation of Shared-Memory Architectures IS - 2 SN - 1045-9219 SP166 EP180 EPD - 166-180 A1 - Daniel J. Sorin, A1 - Jonathan L. Lemon, A1 - Derek L. Eager, A1 - Mary K. Vernon, PY - 2003 KW - Analytical model KW - shared memory multiprocessor KW - heterogeneity KW - performance evaluation KW - mean value analysis. VL - 14 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—This paper develops and validates an efficient analytical model for evaluating the performance of shared memory architectures with ILP processors. First, we instrument the SimOS simulator to measure the parameters for such a model and we find a surprisingly high degree of processor memory request heterogeneity in the workloads. Examining the model parameters provides insight into application behaviors and how they interact with the system. Second, we create a model that captures such heterogeneous processor behavior, which is important for analyzing memory system design tradeoffs. Highly bursty memory request traffic and lock contention are also modeled in a significantly more robust way than in previous work. With these features, the model is applicable to a wide range of architectures and applications. Although the features increase the model complexity, it is a useful design tool because the size of the model input parameter set remains manageable, and the model is still several orders of magnitude quicker to solve than detailed simulation. Validation results show that the model is highly accurate, producing heterogeneous per processor throughputs that are generally within 5 percent and, for the workloads validated, always within 13 percent of the values measured by detailed simulation with SimOS. Several examples illustrate applications of the model to studying architectural design issues and the interactions between the architecture and the application workloads.
[1] V. Adve, R. Bagrodia, J. Browne, E. Deelman, A. Dube, E. Houstis, J. Rice, R. Sakellariou, D. Sundaram-Stukel, P. Teller, and M. Vernon, “POEMS: End-to-end Performance Design of Large Parallel Adaptive Computation Systems,” IEEE Trans. Software Eng., vol. 26, no. 11, pp. 1027-1048, Nov. 2000.
[2] A. Agarwal, M. Horowitz, and J. Hennessy, "An Analytical Cache Model," ACM Trans. Computer Systems, vol. 7, pp. 184-215, May 1989.
[3] D. Albonesi and I. Koren, “A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques,” Int'l J. Parallel Programming, pp. 235-263, 1996.
[4] L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," Proc. 25th Int'l Symp. Computer Architecture, June 1998, pp. 3-14.
[5] D. Eager, D. Sorin, and M. Vernon, “AMVA Techniques for High Service Time Variability,” Proc. ACM SIGMETRICS, pp. 217-228, June 2000.
[6] D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. Eighth Int'l Symp. Computer Architecture, pp. 81-87, 1981.
[7] M. Heinrich et al. “The Stanford FLASH Multiprocessor,” Proc. 21th Int'l Symp. Computer Architecture, pp. 302-313, April 1994.
[8] J. Laudon and D. Lenoski, "The SGI Origin: A cc-NUMA Highly Scalable Server," Proc. 24th Ann. Int'l Symp. Computer Architecture, May 1997.
[9] E. Lazowska, “The Use of Percentiles in Modelling CPU Service Time Distributions,” Proc. IFIP W.G.7.3 Int'l Symp. Computer Performance Modeling, Aug. 1977.
[10] E.D. Lazowska, J. Zahorjan, G.S. Graham, and K.C. Sevcik, Quantitative System Performance, Prentice Hall, pp 64-66, 1984.
[11] M. Oskin, F.T. Chong, and M. Farrens, “HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Designs,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 71-82, June 2000.
[12] V. Pai, P. Ranganathan, and S. Adve, “RSIM Reference Manual,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., Aug. 1997.
[13] V.S. Pai, P. Ranganathan, S.V. Adve, and T. Harton, “An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 12-23, Oct. 1996.
[14] M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta, "Complete Computer System Simulation," IEEE Parallel and Distributed Technology, Fall 1995.
[15] A. Saulsbury, F. Pong, and A. Nowatzyk, “Missing the Memory Wall: The Case for Processor/Memory Integration,” Proc. 23rd Ann. Int'l Symp. Computer Architecture (ISCA '96), pp. 90-101, May 1996.
[16] D. Sorin, J. Lemon, D. Eager, and M. Vernon, “A Customized MVA Model for Shared-Memory Systems with Heterogeneous Applications,” Technical Report 1400, Computer Sciences Dept., Univ. of Wisconsin, Madison, 1999.
[17] D. Sorin, V. Pai, S. Adve, M. Vernon, and D. Wood, “Analytic Evaluation of Shared-Memory Parallel Systems with ILP Processors,” Proc. 25th Int'l Symp. Computer Architecture, pp. 380-391, June 1998.
[18] D. Sorin, M. Vernon, V. Pai, S. Adve, and D. Wood, “A Customized MVA Model for ILP Multiprocessors,” Technical Report 1369, Computer Sciences Dept., Univ. of Wisconsin, Madison, Mar. 1998.
[19] V. Soundararajan et al., "Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors," Proc. 25th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 342-355.
[20] B. Verghese, S. Devine, A. Gupta, and M. Rosenblum, "Operating System Support for Improving Data Locality on cc-NUMA Compute Servers," Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 279-289,Cambridge, Mass., Oct. 1996.
[21] D. Willick and D. Eager, “An Analytical Model of Multistage Interconnection Networks,” Proc. ACM SIGMETRICS, pp. 192-202, May 1990.
[22] S.J.E. Wilton and N.P. Jouppi, Cacti: An Enhanced Cache Access and Cycle Time Model IEEE J. Solid-State Circuits, vol. 31, no. 5, pp. 677-688, May. 1996.
[23] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.

