This Article 
 Bibliographic References 
 Add to: 
Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors
June 1998 (vol. 9 no. 6)
pp. 570-578

Abstract—Instruction fetching is critical to the performance of a superscalar microprocessor. We develop a mathematical model for three different cache techniques and evaluate its performance both in theory and in simulation using the SPEC95 suite of benchmarks. In all the techniques, the fetching performance is dramatically lower than ideal expectations. To help remedy the situation, we also evaluate its performance using prefetching. Nevertheless, fetching performance is fundamentally limited by control transfers. To solve this problem, we introduce a new fetching mechanism called a dual branch target buffer. The dual branch target buffer enables fetching performance to leap beyond the limitation imposed by conventional methods and achieve a high instruction fetching rate.

[1] B. Calder and D. Grunwald, Fast&Accurate Instruction Fetch and Branch Prediction Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 2-11, May 1994.
[2] B. Calder and D. Grunwald, "Reducing Branch Costs Via Branch Alignment," Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 242-251, Oct. 1994.
[3] B.G. Calder, "Hardware and Software Mechanisms for Instruction Fetch Prediction," PhD thesis, Univ. of Colorado, Dec. 1995.
[4] R. F. Cmelik and D. Keppel, “Shade: A Fast Instruction-Set Simulator for Execution Profiling,” Proc. 1994 ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 128-137, May 1994.
[5] T.M. Conte et al., "Optimization of Instruction Fetch Mechanisms for High Issue Rates," Proc. 22nd Int'l Symp. on Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1995, pp. 333-344.
[6] Standard Performance Evaluation Corporation, "SPEC CPU95 Benchmarks,", Mar.1 1998.
[7] V. Popescu et al., "Metaflow Architecture," IEEE Micro, pp. 10-13, 63-73, June 1991.
[8] G.F. Grohoski, "Machine Organization of the IBM RS/6000 Processor," IBM J. Research and Development, vol. 34, no. 1, pp. 37-58, Jan. 1990.
[9] M. Johnson, Superscalar Microprocessor Design.Englewood Cliffs, N.J.: Prentice Hall, 1991.
[10] D.J. Kuck, Y. Muraoka, and S. Chen, "On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup," IEEE Trans. Computers, vol. 21, no. 12, pp. 1,293-1,310, Dec. 1972.
[11] J.K.F. Lee and A.J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," Computer, vol. 21, no. 7, pp. 6-22, July 1984.
[12] A. Nicolau and J.A. Fisher, "Measuring the Parallelism Available for Very Long Instruction Word Architectures," IEEE Trans. Computers, vol. 33, no. 11, pp. 968-976, Nov. 1984.
[13] E.M. Riseman and C.C. Foster, "The Inhibition of Potential Parallelism by Conditional Jumps," IEEE Trans. Computers, vol. 21, no. 12, pp. 1,405-1,411, Dec. 1972.
[14] M.D. Smith, M. Johnson, and M. Horowitz, “Limits on Multiple Instruction Issue,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 290-302, Apr. 1989.
[15] D. Wall, "Limits of Instruction-Level Parallelism," Technical Report 93/6, Digital Equipment Corp., Nov. 1993.
[16] T.-Y. Yeh, D. Marr,, and Y.N. Patt, ``Increasing the Instruction Fetch Rate via Multiple Branch Prediction and Branch Address Cache,'' Proc. Int'l Conf. Supercomputing, pp. 67-76, 1993.

Index Terms:
Computer architecture, instruction fetching, superscalar microprocessor, performance analysis, branch target buffer.
Steven Wallace, Nader Bagherzadeh, "Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 6, pp. 570-578, June 1998, doi:10.1109/71.689444
Usage of this product signifies your acceptance of the Terms of Use.