This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
August 2003 (vol. 52 no. 8)
pp. 1032-1050

Abstract—The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available Instruction-Level Parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of their input operands. Value reuse, on the other hand, tries to eliminate redundant computation by storing the previously produced results of instructions and skipping the execution of redundant instructions. Previous value reuse mechanisms use a single instruction or a naturally formed instruction group, such as a basic block, a trace, or a function, as the reuse unit. These naturally-formed instruction groups are readily identifiable by the hardware at runtime without compiler assistance. However, the performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but they provide greater benefit for each reuse-detection process, it is very important to find the balance point that provides the largest overall performance gain. In this paper, we propose a new mechanism called subblock reuse. Subblocks are created by slicing basic blocks either dynamically or with compiler guidance. The dynamic approaches use the number of instructions, numbers of inputs and outputs, or the presence of store instructions to determine the subblock boundaries. The compiler-assisted approach slices basic blocks using data-flow considerations to balance the reuse granularity and the number of reuse opportunities. The results show that subblocks, which can produce up to 36 percent speedup if reused properly, are better candidates for reuse units than basic blocks. Although subblock reuse with compiler assistance has a substantial and consistent potential to improve the performance of superscalar processors, this scheme is not always the best performer. Subblocks restricted to two consecutive instructions demonstrate surprisingly good performance potential as well.

[1] D. Burger, T. Austin, and S. Bennett, The Simplescalar Tool Set, Version 2.0, Technical Report 1342, Computer Science Dept., Univ. of Wisconsin, Madison, year?
[2] M. Burtscher and B.G. Zorn, “Exploring Last$\big. n\bigr.$Value Prediction,” Proc. 1999 Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 66-76, Oct. 1999.
[3] B. Calder, G. Reinman, and D. Tullsen, Selective Value Prediction Proc. 26th Int'l Symp. Computer Architecture, 1999.
[4] D. Citron, D. Feitelson, and L. Rudolph, Accelerating Multi-Media Processing by Implementing Memoing in Multiplication and Division Units Proc. Eighth Int'l Symp. Architectural Support for Programming Languages and Operating Systems, pp. 252-261, Oct. 1998.
[5] L. Codrescu and D. Wills, On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm J. Universal Computer Science, vol. 6, no. 10, pp. 907-927, 2000.
[6] D.A. Connors and W.W. Hwu, “Compiler-Directed Computation Reuse: Rationale and Initial Results,” Proc. 32nd Ann. Int'l Symp. Microarchitecture, pp. 158-169, Nov. 1999.
[7] A.T. da Costa, F.M.G. Franca, and E.M.C. Filho, The Dynamic Trace Memorization Reuse Technique Proc. 2000 Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 92-99, 2000.
[8] F. Gabbay and A. Mendelson, Using Value Prediction to Increase the Power of Speculative Execution Hardware ACM Trans. Computer Systems, vol. 16, no. 3, pp. 234-270, Aug. 1998.
[9] A. Gonzalez and M. Valero, “Virtual Physical Registers,” Proc. Fourth Int'l Symp. High Performance Computer Architecture (HPCA-4), pp. 175-184, Feb. 1998.
[10] A. Gonzalez, J. Tubella, and C. Molina, “Trace-Level Reuse,” Proc. Int'l Conf. Parallel Processing, Sept. 1999.
[11] J. González and A. González, “The Potential of Data Value Speculation to Boost ILP,” Proc. Int'l Conf. Supercomputing, 1998.
[12] S.P. Harbison, “An Architectural Alternative to Optimizing Compilers,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 57-65, Mar. 1982.
[13] J. Huang, Improving Processor Performance through Compiler Assisted Block Reuse, PhD thesis, Dept. of Computer Science and Eng., Univ. Minnesota, Apr. 2000.
[14] J. Huang and D.J. Lilja, Extending Value Reuse to Basic Blocks with Compiler Assistance IEEE Trans. Computers, vol. 49, no. 4, pp. 331-347, Apr. 2000.
[15] J. Huang and D. Lilja, “Exploiting Basic Block Value Locality with Block Reuse,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA-5), pp.106-114, Jan. 1999.
[16] J. Huang and D.J. Lilja, Exploring Subblock Value Reuse for Superscalar Processors Proc. 2000 Int'l Conf. Parallel Architecture and Compiler Techniques, pp. 100-110, Oct. 2000.
[17] Y. Choi, J. Yi, J. Huang, and D.J. Lilja, Improving Value Prediction by Exploiting Both Operand and Output Value Locality Technical Report No. ARCTiC 00-09, Laboratory for Advanced Research in Computing Technology and Compilers, Univ. of Minnesota, July 2000.
[18] D. Lilja, Measuring Computer Performance: A Practitioner's Guide. New York: Cambridge Univ. Press, 2000.
[19] M.S. Lam and R.P. Wilson, “Limits of Control Flow on Parallelism,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 46-57, 19-21 May 1992.
[20] M.H. Lipasti, C.B. Wilkerson, and J.P. Shen, "Value Locality and Load Value Prediction," Proc. Seventh Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1996, pp. 138-147.
[21] M.H. Lipasti and J.P. Shen, "Exceeding the Data-Flow Limit Via Value Prediction," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 226-237.
[22] D. Michie, Memo Functions and Machine Learning Nature, vol. 218, pp. 19-22, 1968.
[23] C. Molina, A. Gonzalez, and J. Tubella, Dynamic Removal of Redundant Computations Proc. 1999 Int'l Conf. Supercomputing, pp. 474-481, July 1998.
[24] T. Nakra, R. Gupta, and M.L. Soffa, Global Context-Based Value Prediction Proc. Fifth Int'l Conf. High Performance Computing Architecture, pp. 4-12, 1999.
[25] S. Richardson, “Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation,” Technical Report SMLI TR-92-1, Sun Microsystems Laboratories, Sept. 1992.
[26] E. Rotenberg, S. Bennett, and J. Smith, "Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 24-34.
[27] B. Rychlik, J. Faistl, B. Krug, and J. Shen, “Efficacy and Performance Impact of Value Prediction,” Parallel Architectures and Compilation Techniques, Oct. 1998.
[28] Y. Sazeides and J. Smith, “The Predictability of Data Values,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '30), pp. 248-258, Dec. 1997.
[29] A. Sodani and G. Sohi, Dynamic Instruction Reuse Proc. 24th Int'l Symp. Computer Architecture, pp. 194-205, June 1997.
[30] A. Sodani and G. Sohi, Understanding the Difference Between Value Prediction and Instruction Reuse Proc. 31st Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 205-215, Dec. 1998.
[31] A. Sodani and G.S. Sohi, “An Empirical Analysis of Instruction Repetition,” Proc. Eighth Int'l Symp. Architectural Support for Programming Languages and Operating Systems, pp. 35-45, Oct. 1998.
[32] G.S. Sohi, "Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers," IEEE Trans. Computers, Vol. 39, No. 3, 1990, pp. 349-359.
[33] D. Tullsen and J. Seng, “Storageless Value Prediction Using Prior Register Values,” Proc. 26th Int'l Symp. Computer Architecture, May 1999.
[34] K. Wang and M. Franklin, Highly Accurate Data Value Prediction Using Hybrid Predictors Proc. 30th Int'l Symp. Microarchitecture, 1997.

Index Terms:
Block reuse, subblock reuse, value locality, compiler flow analysis, value reuse.
Citation:
Jian Huang, David J. Lilja, "Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse," IEEE Transactions on Computers, vol. 52, no. 8, pp. 1032-1050, Aug. 2003, doi:10.1109/TC.2003.1223638
Usage of this product signifies your acceptance of the Terms of Use.