The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2008 vol.57)
pp: 389-403
ABSTRACT
Advances in semiconductor technology enable larger processor design space, leading to increasingly complex systems. Designers must evaluate many architecture design points to achieve the optimal design. Currently, most architecture exploration is performed using cycle accurate simulators. Although accurate, these tools are slow, thus limiting a comprehensive design search. The vast design space of today's complex processors and time to market economic pressures motivate the need for faster architectural evaluation methods. This paper presents a superscalar processor performance model that enables rapid exploration of the architecture design space for superscalar processors. It supplements current design tools by quickly identifying promising areas for more thorough and time consuming exploration with traditional tools. The model estimates instruction throughput of a superscalar processor based on early architectural design parameters and application properties. It has been validated with the Simplescalar out-of-order simulator. The model, which executed 40,000 times faster, produces instruction throughput estimates that are with within 5.5% of the corresponding SimpleScalar values.
INDEX TERMS
Modeling of computer architecture, Pipeline processors, Modeling techniques
CITATION
Tarek M. Taha, Scott Wills, "An Instruction Throughput Model of Superscalar Processors", IEEE Transactions on Computers, vol.57, no. 3, pp. 389-403, March 2008, doi:10.1109/TC.2007.70817
REFERENCES
[1] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An Infrastructure for Computer System Modeling,” Computer, vol. 35, no. 2, pp. 59-67, Feb. 2002.
[2] E. Berg and E. Hagersten, “StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis,” Proc. Int'l Symp. Performance Analysis of Systems and Software, 2004.
[3] E. Berg, H. Zeffer, and E. Hagersten, “A Statistical Multiprocessor Cache Model,” Proc. Int'l Symp. Performance Analysis of Systems and Software, 2006.
[4] R. Desikan, D.C. Burger, and S.W. Keckler, “Measuring Experimental Error in Microprocessor Simulation,” Proc. Int'l Symp. Computer Architecture, 2001.
[5] K. Dubey, G.B. Adams III, and M.J. Flynn, “Instruction Window Size Trade-Offs and Characterization of Program Parallelism,” IEEE Trans. Computers, vol. 43, no. 4, pp. 431-442, Apr. 1994.
[6] L. Eeckhout and K. De Bosschere, “Increasing the Accuracy of Statistical Simulation for Modeling Superscalar Processors,” Proc. IEEE Int'l Conf. Performance, Computing, and Comm., 2001.
[7] L. Eeckhout, S. Nussbaum, J. Smith, and K. De Bosschere, “Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox,” IEEE Micro, vol. 23, no. 5, pp. 26-38, Sept./Oct. 2003.
[8] L. Eeckhout, R.H. Bell Jr., B. Stougie, K. De Bosschere, and L.K. John, “Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies,” Proc. Int'l Symp. Computer Architecture, 2004.
[9] S. Eyerman, L. Eeckhout, and K. De Bosschere, “Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors,” Proc. Design, Automation and Test in Europe, 2006.
[10] D. Genbrugge, L. Eeckhout, and K. De Bosschere, “Accurate Memory Data Flow Modeling in Statistical Simulation,” Proc. Int'l Conf. Supercomputing, 2006.
[11] G. Hamerly, E. Perelman, J. Lau, and B. Calder, “SimPoint 3.0: Faster and More Flexible Program Phase Analysis,” J. Instruction-Level Parallelism, vol. 7, pp. 1-28, 2005.
[12] G. Hamerly, E. Perelman, J. Lau, B. Calder, and T. Sherwood, “Using Machine Learning to Guide Architecture Simulation,” J.Machine Learning Research, vol. 7, pp. 343-378, 2006.
[13] A. Hossain and D.J. Pease, “An Analytical Model for Trace Cache Instruction Fetch Performance,” Proc. Int'l Conf. Computer Design, 2001.
[14] T. Huffmire and T. Sherwood, “Wavelet-Based Phase Classification,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 2006.
[15] E. Ipek, S.A. McKee, B.R. de Supinski, and R. Caruana, “Efficiently Exploring Architectural Design Spaces via Predictive Modeling,” Proc. ACM Symp. Architectural Support for Programming Languages and Operating Systems, 2006.
[16] P.J. Joseph, K. Vaswani, and M.J. Thazhuthaveetil, “Construction and Use of Linear Regression Models for Processor Performance Analysis,” Proc. Int'l Symp. High Performance Computer Architecture, 2006.
[17] P.J. Joseph, K. Vaswani, and M.J. Thazhuthaveetil, “A Predictive Performance Model for Superscalar Processors,” Proc. Int'l Symp. Microarchitecture, Dec. 2006.
[18] N.P. Jouppi, “The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance,” IEEE Trans. Computers, vol. 38, no. 12, pp. 1645-1658, Dec. 1989.
[19] T.S. Karkhanis and J.E. Smith, “A First-Order Superscalar Processor Model,” Proc. Int'l Symp. Computer Architecture, 2004.
[20] H.J. Kim, S.M. Kim, and S.B. Choi, “System Performance Analyses of Out-of-Order Superscalar Processors Using Analytical Method,” IEICE Trans. Fundamentals of Electronics Comm. and Computer Sciences, vol. E82A, no. 6, pp. 927-938, June 1999.
[21] R. Kumar and D.M. Tullsen, “Compiling for Instruction Cache Performance on a Multithreaded Architecture,” Proc. Int'l Symp. Microarchitecture, 2002.
[22] B. Lee and D. Brooks, “Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction,” Proc. ACM Symp. Architectural Support for Programming Languages and Operating Systems, 2006.
[23] R. Lee and M. Smith, “Media Processing: A New Design Target,” IEEE Micro, vol. 16, no. 4, pp. 6-9, Aug. 1996.
[24] M. Lipasti and J. Shen, “Exceeding the Dataflow Limit with Value Prediction,” Proc. Int'l Symp. Microarchitecture, 1996.
[25] P. Michaud, A. Seznec, and S. Jourdan, “Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 1999.
[26] D.B. Noonburg and J.P. Shen, “Theoretical Modeling of Superscalar Processor Performance,” Proc. Int'l Symp. Microarchitecture, 1994.
[27] D.B. Noonburg and J.P. Shen, “A Framework for Statistical Modeling of Superscalar Processor Performance,” Proc. Int'l Symp. High Performance Computer Architecture, 1997.
[28] S. Nussbaum and J.E. Smith, “Statistical Simulation of Symmetric Multiprocessor Systems,” Proc. 35th Ann. Simulation Symp., 2002.
[29] M. Oskin, F.T. Chong, and M. Farrens, “HLS: Combining Statistical and Symbolic Simulation to Guide Microprocessor Designs,” Proc. Int'l Symp. Computer Architecture, 2000.
[30] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. Int'l Symp. Computer Architecture, 1997.
[31] Y.H. Pyun, C.S. Park, and S.B. Choi, “The Effect of Instruction Window on the Performance of Superscalar Processors,” IEICE Trans. Fundamentals of Electronics Comm. and Computer Sciences, vol. E81A, no. 6, pp. 1036-1044, June 1998.
[32] M.J. Serrano, “Performance Estimation in a Simultaneous Multithreading Processor,” Proc. Int'l Workshop Modeling, Analysis, and Simulation of Computer and Telecomm. Systems, 1996.
[33] G.S. Sohi, “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Trans. Computers, vol. 39, no. 3, Mar. 1990.
[34] T.M. Taha and D.S. Wills, “An Instruction Throughput Model of Superscalar Processors,” Proc. Int'l Workshop Rapid System Prototyping, 2003.
[35] T.F. Wenisch, R.E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J.C. Hoe, “SimFlex: Statistical Sampling of Computer System Simulation,” IEEE Micro, vol. 26, no. 4, pp. 18-31, July/Aug. 2006.
[36] I. Williams, “An Illustration of the MIPS R12000TM Microprocessor and OCTANE System Architecture,” white paper, http://www.sgi.com/products/remarketed/octane octane.pdf, 1999.
[37] J. Xue and X. Vera, “Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior,” IEEE Trans. Computers, vol. 53, no. 5, pp. 547-566, May 2004.
[38] J.J. Yi, D.J. Lilja, and D.M. Hawkins, “Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor,” IEEE Trans. Computers, vol. 54, no. 11, pp. 1360-1373, Nov. 2005.
[39] J.J. Yi and D.J. Lilja, “Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations,” IEEE Trans. Computers, vol. 55, no. 3, pp. 268-280, Mar. 2006.
[40] Y. Zhong, S.G. Dropsho, X. Shen, A. Studer, and C. Ding, “Miss Rate Prediction across Program Inputs and Cache Configurations,” IEEE Trans. Computers, vol. 56, no. 3, pp. 328-343, Mar. 2007.
[41] Y. Zhu and W.F. Wong, “Modeling Architectural Improvements in Superscalar Processors,” Proc. Fourth Int'l High Performance Computing in the Asia-Pacific Region, 2000.
[42] V. Zyuban, D. Brooks, V. Srinivasan, M. Gschwind, P. Bose, P.N. Strenski, and P.G. Emma, “Integrated Analysis of Power and Performance for Pipelined Microprocessors,” IEEE Trans. Computers, vol. 53, no. 8, pp. 1004-1016, Aug. 2004.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool