|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Todd C. Mowry, Chi-Keung Luk, "Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications," IEEE Transactions on Computers, vol. 49, no. 4, pp. 369-384, April, 2000. | |||
| BibTex | x | ||
| @article{ 10.1109/12.844349, author = {Todd C. Mowry and Chi-Keung Luk}, title = {Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications}, journal ={IEEE Transactions on Computers}, volume = {49}, number = {4}, issn = {0018-9340}, year = {2000}, pages = {369-384}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.844349}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Understanding Why Correlation Profiling Improves the Predictability of Data Cache Misses in Nonnumeric Applications IS - 4 SN - 0018-9340 SP369 EP384 EPD - 369-384 A1 - Todd C. Mowry, A1 - Chi-Keung Luk, PY - 2000 KW - Cache performance KW - cache miss prediction KW - correlation-based profiling. VL - 49 JA - IEEE Transactions on Computers ER - | |||
Abstract— Latency-tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likely to suffer cache misses—otherwise the runtime overheads can potentially offset any gains. In this paper, we focus on isolating dynamic miss instances in
[1] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1992.
[2] S.G. Abraham, R.A. Sugumar, D. Windheiser, B.R. Rau, and R. Gupta, “Predictability of Load/Store Instruction Latencies,” Proc. 26th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 139-152, Dec. 1993.
[3] M. Horowitz, M. Martonosi, T.C. Mowry, and M.D. Smith, “Informing Memory Operations: Memory Performance Feedback Mechanisms and Their Applications,” ACM Trans. Computer Systems, vol. 16, no. 2, pp. 170-205, May 1998.
[4] J. Dean et al., "ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors," Proc. 30th Symp. Microarchitecture (Micro-30), IEEE CS Press, Los Alamitos, Calif., 1997, pp. 292-302.
[5] P. Chang, E. Hao, T. Yeh, and Y. Patt, “Branch Classification: A New Mechanism for Improving Branch Predictor Performance,” Proc. 27th Ann. ACM/IEEE Int'l Symp. Microarchitecture, Nov. 1994.
[6] S. McFarling, “Combining Branch Predictors,” Technical Report TN-36, Digital Western Research Laboratory, June 1993.
[7] S. Pan, K. So, and J. Rahmeh, “Improving the Accuracy of Dynamic Branch Prediction Using Branch Correlation,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 76-84, Oct. 1992.
[8] T.-Y. Yeh and Y. Patt, “A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 257-266, May 1993.
[9] C. Young and M. Smith, “Improving the Accuracy of Static Branch Prediction Using Branch Correlation,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 232-241, Oct. 1994.
[10] S.G. Abraham and B.R. Rau, “Predicting Load Latencies Using Cache Profiling,” Technical Report HPL-94-110, Hewlett-Packard Co., Nov. 1994.
[11] G. Ammons, T. Ball, and J. Larus, “Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling,” Proc. ACM SIGPLAN 97 Conf. Programming Language Design and Implementation, June 1997.
[12] C.-K. Luk, “Optimizing the Cache Performance of Non-Numeric Applications, PhD thesis, Dept. of Computer Science, Univ. of Toronto, Jan. 2000.
[13] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[14] A. Rogers, M. Carlisle, J. Reppy, and L. Hendren, “Supporting Dynamic Data Structures on Distributed Memory Machines,” ACM Trans. Programming Languages and Systems, vol. 17, no. 2, Mar. 1995.
[15] M.D. Smith, “Tracing with Pixie,” Technical Report CSL-TR-91-497, Stanford Univ., Nov. 1991.
[16] K.D. Cooper, M.W. Hall, and K. Kennedy, “A Methodology for Procedure Cloning,” Computer Languages, vol. 19, no. 2, Apr. 1993.
[17] C.-K. Luk and T.C. Mowry, “Compiler-Based Prefetching for Recursive Data Structures,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 222-233, Oct. 1996.
[18] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.

