|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Kevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, Douglas W. Clark, "Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques," IEEE Transactions on Computers, vol. 48, no. 11, pp. 1260-1281, November, 1999. | |||
| BibTex | x | ||
| @article{ 10.1109/12.811115, author = {Kevin Skadron and Pritpal S. Ahuja and Margaret Martonosi and Douglas W. Clark}, title = {Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques}, journal ={IEEE Transactions on Computers}, volume = {48}, number = {11}, issn = {0018-9340}, year = {1999}, pages = {1260-1281}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.811115}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques IS - 11 SN - 0018-9340 SP1260 EP1281 EPD - 1260-1281 A1 - Kevin Skadron, A1 - Pritpal S. Ahuja, A1 - Margaret Martonosi, A1 - Douglas W. Clark, PY - 1999 KW - Microarchitecture KW - trade-offs KW - branch prediction KW - cache KW - sampling KW - simulation KW - out-of-order execution KW - instruction window size KW - register-update unit. VL - 48 JA - IEEE Transactions on Computers ER - | |||
Abstract—Design parameters interact in complex ways in modern processors, especially because out-of-order issue and decoupling buffers allow latencies to be overlapped. Trade-offs among instruction-window size, branch-prediction accuracy, and instruction- and data-cache size can change as these parameters move through different domains. For example, modeling unrealistic caches can under- or overstate the benefits of better prediction or a larger instruction window. Avoiding such pitfalls requires understanding how
[1] P.S. Ahuja, K. Skadron, M. Martonosi, and D.W. Clark, “Multi-Path Execution: Opportunities and Limits,” Proc. 12th Int'l Conf. Supercomputing, pp. 101-108, July 1998.
[2] D.I. August, D.A. Connors, J.C. Gyllenhaal, and W.W. Hwu, “Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results,” Proc. Third Int'l Symp. High-Performance Computer Architecture, pp. 84-93, Feb. 1997.
[3] D. Burger Personal communication, Mar. 1998.
[4] D. Burger, T.M. Austin, and S. Bennett, “Evaluating Future Microprocessors: The SimpleScalar Tool Set,” Technical Report TR-1308, Computer Sciences Dept., Univ. of Wisconsin-Madison, July 1996.
[5] B. Calder and D. Grunwald, Fast&Accurate Instruction Fetch and Branch Prediction Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 2-11, May 1994.
[6] B. Calder and D. Grunwald, “Reducing Indirect Function Call Overhead in C++ Programs,” Proc. 21st ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 397-408, Jan. 1994.
[7] P.-Y. Chang, E. Hao, and Y.N. Patt, Alternative Implementations of Hybrid Branch Predictors Proc. 28th Ann. Int'l Symp. Microarchitecture, pp. 252-257, Dec. 1995.
[8] P.-Y. Chang, E. Hao, and Y.N. Patt, “Target Prediction for Indirect Jumps,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 274-283, June 1997.
[9] K. Driesen and U. Hölzle, “Accurate Indirect Branch Prediction,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 167-178, July 1998.
[10] A.N. Eden and T. Mudge, “The YAGS Branch Prediction Scheme,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 69-77, Dec. 1998.
[11] J. Emer Personal communication, June 1997.
[12] J. Emer and N. Gloy, “A Language for Describing Predictors and Its Application to Automatic Synthesis,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 304-314, June 1997.
[13] M. Evers, S.J. Patel, R.S. Chappell, and Y.N. Patt, “An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 52-61, June 1998.
[14] K.I. Farkas, P. Chow, N.P. Jouppi, and Z. Vranesic, “Memory-System Design Considerations for Dynamically-Scheduled Processors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 133-143, May 1997.
[15] J. Fisher and S. Freudenberger,"Predicting Conditional Branch Directions from Previous Runs of a Program," Proc. 5th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), ACM Press, 1992, pp. 85-95.
[16] L. Gwennap, “Intel's P6 Uses Decoupled Superscalar Design,” Microprocessor Report, pp. 9-15, 16 Feb. 1995.
[17] E. Hao, P.-Y. Chang, and Y. Patt, “The Effect of Speculatively Updating Branch History on Branch Prediction Accuracy, Revisited,” Proc. 27th Ann. Int'l Symp. Microarchitecture, Nov. 1994.
[18] V.S. Iyengar and L.H. Trevillyan, “Evaluation and Generation of Reduced Traces for Benchmarks,” IBM Research Report RC 20610, Oct. 1996.
[19] Q. Jacobson, E. Rotenberg, and J.E. Smith, “Path-Based Next Trace Prediction,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1997.
[20] R. Johnson and M. Schlansker, “Analysis Techniques for Predicated Code,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 100-113, Dec. 1996.
[21] T.L. Johnson and W.W. Hwu, “Run-Time Adaptive Cache Hierarchy Management via Reference Analysis,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 315-326, June 1997.
[22] N.P. Jouppi and P. Ranganathan, “The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance,” Proc. Workshop Mixing Logic and DRAM: Chips That Compute and Remember, June 1997, ftp://ftp.cs.wisc.edu/sohi/papers/1998/micro.compiler.ps.gzhttp:/ /ayer.CS.Berkeley.EDU isca97-workshop.
[23] N.P. Jouppi and D.W. Wall,"Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines," Proc. Third Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Assoc. of Computing Machinery,N.Y., Apr. 1989, pp. 272-282.
[24] S. Jourdan, J. Stark, T.-H. Hsing, and Y.N. Patt, “Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-Path Execution,” Int'l J. Parallel Programming, vol. 25, no. 5, pp. 363-383, Oct. 1997.
[25] R.E. Kessler, M.D. Hill, and D.A. Wood, “A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches,” Technical Report 1048, Univ. of Wisconsin Computer Sciences Dept., Sept. 1991.
[26] R.E. Kessler, E.J. McLellan, and D.A. Webb, The Alpha 21264 Microprocessor Architecture Proc. 1998 Int'l Conf. Computer Design, pp. 90-95, Oct. 1998.
[27] A. Klauser, V. Paithankar, and D. Grunwald, “Selective Eager Execution on the PolyPath Architecture,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 250-259, July 1998.
[28] R. Kol and R. Ginosaur, “Kin: A High Performance Asynchronous Processor Architecture,” Proc. 12th Int'l Conf. Supercomputing, pp. 433-440, July 1998.
[29] D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. Eighth Int'l Symp. Computer Architecture, pp. 81-87, 1981.
[30] S. Laha, J.A. Patel, and R.K. Iyer, "Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems," IEEE Trans. Computing, Feb. 1988, pp. 1,325-1,336.
[31] M.S. Lam and R.P. Wilson, “Limits of Control Flow on Parallelism,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 46-57, 19-21 May 1992.
[32] C.-C. Lee, I.-C.K. Chen, and T.N. Mudge, “The Bi-Mode Branch Predictor,” Proc. 30th Ann. Int'l Symp. Microarchitecture, pp. 4-13, Dec. 1997.
[33] S. Mahlke and B. Natarajan, “Compiler Synthesized Dynamic Branch Prediction,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 153-164, Dec. 1996.
[34] M. Martonosi, A. Gupta, and T. Anderson, “Effectiveness of Trace Sampling for Performance Debugging Tools,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 248-259, May 1993.
[35] A. Maynard, C. Donnelly, and B. Olszewski, “Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 145-156, Oct. 1994.
[36] S. McFarling, “Combining Branch Predictors,” Technical Note TN-36, DEC WRL, June 1993.
[37] P. Michaud, A. Seznec, and R. Uhlig, “Trading Conflict and Capacity Aliasing in Conditional Branch Predictors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 292-303, June 1997.
[38] MIPS Tech nologies, MIPS R10000 Microprocessor User's Manual, version 1.0, June 1995.
[39] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1992.
[40] S. Pan, K. So, and J. Rahmeh, “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 76-84, Oct. 1992.
[41] J. Pierce and T. Mudge, “Wrong-Path Instruction Prefetching,” Proc. 29th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 165-175, Dec. 1996.
[42] A.K. Porterfield, “Software Methods for Improvement of Cache Performance on Supercomputer Applications,” doctoral thesis, Dept. of Computer Science, Rice Univ., Apr. 1989.
[43] C. Price, MIPS IV Instruction Set, Revision 3.1, MIPS Technologies, Inc., Mountain View, Calif., Jan. 1995.
[44] B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs,” Computer, pp. 12-35, Jan. 1989.
[45] E. Rotenberg, S. Bennett, and J. Smith, "Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 24-34.
[46] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, Trace Processors Proc. 30th Int'l Symp. Microarchitecture, pp. 138-148, 1997.
[47] E. Rothberg, J.P. Singh, and A. Gupta, "Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 14-25, ACM, May 1993.
[48] V. Santhanam, E.H. Gornish, and W.-C. Hsu, “Data Prefetching on the HP PA-8000,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 264-273, June 1997.
[49] S. Sechrest, C.-C. Lee, and T. Mudge, “Correlation and Aliasing in Dynamic Branch Predictors,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 22-32, May 1995.
[50] K. Skadron, P.S. Ahuja, M. Martonosi, and D.W. Clark, Improving Prediction for Procedure Returns with Return-Address-Stack Repair Mechanisms Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 259-271, Dec. 1998.
[51] K. Skadron and D.W. Clark, “Design Issues and Trade-Offs for Write Buffers,” Proc. Third Int'l Symp. High-Performance Computer Architecture, pp. 144-155, Feb. 1997.
[52] K. Skadron, D.W. Clark, and M. Martonosi, “Speculative Updates of Local and Global Branch History: A Quantitative Analysis,” J. Instruction-Level Parallelism, to appear.
[53] K. Skadron, M. Martonosi, and D.W. Clark, “Alloying Global and Local Branch History: Taxonomy, Performance, and Analysis,” Technical Report TR-594-99, Princeton Univ. Dept. of Computer Science, Jan. 1999.
[54] K. Skadron, M. Martonosi, and D.W. Clark, “Selecting a Single, Representative Sample for Accurate Simulation of Specint Benchmarks,” Technical Report TR-595-99, Princeton Univ. Dept. of Computer Science, Jan. 1999.
[55] M.D. Smith, M. Johnson, and M. Horowitz, “Limits on Multiple Instruction Issue,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 290-302, Apr. 1989.
[56] G.S. Sohi and A.S. Vajapeyam, “Instruction Issue Logic for High-Performance, Interruptible Pipelined Processors,” Proc. 14th Ann. Int'l Symp. Computer Architecture, pp. 27-34, June 1987.
[57] E. Sprangle, R.S. Chappell, M. Alsup, and Y.N. Patt, “The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference,” Proc. 24th Annual Int'l Symp. Computer Architecture, pp. 284-291, June 1997.
[58] S. Srinivasan and A. Lebeck, “Load Latency Tolerance in Dynamically Scheduled Processors,” Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 148-159, Dec. 1998.
[59] The Standard Performance Evaluation Corporation,http:/www.specbench.org, Dec. 1996
[60] D.W. Wall, “Limits of Instruction-Level Parallelism,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 176-188, 8-11 Apr. 1991.
[61] S. Wallace, B. Calder, and D.M. Tullsen, “Threaded Multiple Path Execution,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 238-249, July 1998.
[62] K.M. Wilson and K. Olukotun, “Designing High Bandwidth On-Chip Caches,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 121-132, June 1997.
[63] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[64] D.A. Wood, M.D. Hill, and R.E. Kessler, “A Model for Estimating Trace-Sample Miss Ratios,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 79-89, June 1991.
[65] T.-Y. Yeh and Y. Patt, “A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 257-266, May 1993.
[66] C. Young, N. Gloy, and M. Smith, “A Comparative Analysis of Schemes for Correlated Branch Prediction,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, May 1995.
[67] C. Young and M. Smith, “Improving the Accuracy of Static Branch Prediction Using Branch Correlation,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 232-241, Oct. 1994.

