The Community for Technology Leaders
RSS Icon
Issue No.02 - July-Dec. (2012 vol.11)
pp: 41-44
Reena Panda , Texas A&M University, College Station
Paul V. Gratz , University of Texas at Austin, Austin
Daniel A. Jimenez , UT San Antonio Rutgers University, San Antonio Piscataway
Computer architecture is beset by two opposing trends. Technology scaling and deep pipelining have led to high memory access latencies; meanwhile, power and energy considerations have revived interest in traditional in-order processors. In-order processors, unlike their superscalar counterparts, do not allow execution to continue around data cache misses. In-order processors, therefore, suffer a greater performance penalty in the light of the current high memory access latencies. Memory prefetching is an established technique to reduce the incidence of cache misses and improve performance. In this paper, we introduce B-Fetch, a new technique for data prefetching which combines branch prediction based lookahead deep path speculation with effective address speculation, to efficiently improve performance in in-order processors. Our results show that B-Fetch improves performance 38.8% on SPEC CPU2006 benchmarks, beating a current, state-of-the-art prefetcher design at ~ 1/3 the hardware overhead.
Prefetching, Registers, Process control, Benchmark testing, Computer architecture, Cache memory, Value Prediction, Prefetching, Registers, Pipelines, Benchmark testing, Computer architecture, Hardware, In-order Processors, Data Cache Prefetching, Memory Systems, Branch Prediction
Reena Panda, Paul V. Gratz, Daniel A. Jimenez, "B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors", IEEE Computer Architecture Letters, vol.11, no. 2, pp. 41-44, July-Dec. 2012, doi:10.1109/L-CA.2011.33
1. N. L. Binkert,R. G. Dreslinski,L. R. Hsu,K. T. Lim,A. G. Saidi,, and S. K. Reinhardt,“The M5 Simulator: Modeling Networked Systems,” IEEE Micro, volume 26, pp. 52-60, 2006.
2. J. Dundas and T. Mudge,“Improving data cache performance by pre-executing instructions under a cache miss,” in The International Conference on Supercomputing (ICS), 1997, pp. 68-75.
3. M. Farooq,L. Chen,, and L. John,“Value based btb indexing for indirect jump prediction,” in 16th International Symposium on High Performance Computer Architecture (HPCA), jan. 2010, pp. 1-11.
4. M. Ferdman,S. Somogyi,, and B. Falsafi,“Spatial memory streaming with rotated patterns,” in The 1st JILP Data Prefetching Cham­pionship, 2009.
5. T. fu Chen and J. loup Baer,“Effective hardware-based data prefetching for high-performance processors,” IEEE Transactions on Computers, volume 44, no. 609-623, 1995.
6. T. Halfhill,“Intel's tiny atom,” Microprocessor Report, volume 22, no. 4, p. 1, 2008.
7. D. A. Jimenez,“Composite confidence estimators for enhanced speculation control,” in The 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2009, pp. 161-168.
8. K. Krewell,“Suns niagara pours on the cores,” volume 18, no. 9, 2004, pp. 11-13.
9. Y. Liu and D. R. Kaeli,“Branch-directed and stride-based data cache prefetching,” in The International Conference on Computer Design, ser. ICCD, 1996, pp. 225-230.
10. K. Malik,M. Agarwal,V. Dhar,, and M. Frank,“Paco: Probability-based path confidence prediction,” in The 14th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2008, pp. 50-61.
11. O. Mutlu,J. Stark,C. Wilkerson,, and Y. N. Patt,“Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors,” in The 9th International Symposium on Hish-Periormance Computer Architecture (HPCA), 2003.
12. T. Nakra,R. Gupta,, and M. Soffa,“Global Context-Based Value Prediction,” in Fifth International Symposium On High-Performance Computer Architecture (HPCA). IEEE, 1999, pp. 4-12.
13. S. Pinter and A. Yoaz,“Tango: a hardware-based data prefetching technique for superscalar processors,” in The 29th annual ACM/IEEE international symposium on Microarchitecture (Micro). IEEE Computer Society, 1996, pp. 214-225.
14. A. Roth,A. Moshovos,, and G. S. Sohi,“Dependence based prefetching for linked data structures,” in The Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1998, pp. 115-126.
15. S. Somogyi,T. F. Wenisch,A. Ailamaki,B. Falsafi,, and A. Moshovos,“Spatial memory streaming,” in The 33rd annual international symposium on Computer Architecture (ISCA), 2006, pp. 252-263.
16. V. Srinivasan,E. S. Davidson,G. S. Tyson,M. J. Charney,, and T. R. Puzak,“Branch history guided instruction prefetching,” in The 7th International Conference on High Performance Computer Architecture (HPCA), 2001, pp. 291-300.
17. D. Wentzlaff,P. Griffin,H. Hoffmann,L. Bao,B. Edwards,C. Ramey,M. Mattina,C.-C. Miao,J. F. B. III,, and A. Agarwal,“On-chip interconnection architecture of the tile processor,” IEEE Micro, volume 27, no. 5, pp. 15-31, 2007.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool