The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2013 vol.62)
pp: 944-955
R. Ubal , Electr. & Comput. Eng. Dept., Northeastern Univ., Boston, MA, USA
J. Sahuquillo , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Valencia, Spain
S. Petit , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Valencia, Spain
Pedro Lopez , Dept. of Comput. Eng. (DISCA), Univ. Politec. de Valencia, Valencia, Spain
ABSTRACT
Multicore chips are currently dominating the microprocessor market as designs that improve performance and sustain power consumption. However, complex core features must be still considered to provide good performance for existing sequential applications. An effective approach to reduce core complexity without dramatically sacrificing performance is to distribute critical processor structures by using clustered microarchitectures. In these designs, communication latency among clusters is a critical performance bottleneck, and a good steering algorithm is required to reduce intercluster communication. In this paper, we propose a new energy-efficient microarchitectural approach that reduces intercluster communication by detecting and generating independent chains of instructions, referred to as subtraces, from the execution of sequential programs. The devised mechanism has been modeled on an x86-based trace-cache processor, where subtraces are built in the fill unit, stored in a trace cache, and individually steered to different clusters. Experimental results show that the proposal reaches performance speedups around 7 and 15 percent for point-to-point and bus-based interconnects, respectively, while achieving energy savings of up to 12 percent.
INDEX TERMS
Registers, Clustering algorithms, Program processors, Algorithm design and analysis, Radiation detectors, Multicore processing,subtraces, Registers, Clustering algorithms, Program processors, Algorithm design and analysis, Radiation detectors, Multicore processing, parallelism, Clustered processors
CITATION
R. Ubal, J. Sahuquillo, S. Petit, Pedro Lopez, J. Duato, "Hardware-based generation of independent subtraces of instructions in clustered processors", IEEE Transactions on Computers, vol.62, no. 5, pp. 944-955, May 2013, doi:10.1109/TC.2012.42
REFERENCES
[1] C. McNairy and R. Bhatia, "Montecito: A Dual-Core, Dual-Thread Itanium Processor," IEEE Micro, vol. 25, no. 2, pp. 10-20, Mar./Apr. 2005.
[2] R. Kalla, B. Sinharoy, and J.M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro, vol. 24, no. 2, pp. 40-47, Mar./Apr. 2004.
[3] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processor," Proc. 24th Int'l Symp. Computer Architecture, June 1997.
[4] R. Canal, J.M. Parcerisal, and A. González, "A Cost-Effective Clustered Architecture," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, Oct. 1999.
[5] T.-y. Yeh, D.T. Marr, and Y.N. Patt, "Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache," Proc. Seventh ACM Conf. Supercomputing, 1993.
[6] E. Rotenberg, S. Bennett, and J.E. Smith, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th Int'l Symp. Microarchitecture, Dec. 1996.
[7] J.M. Parcerisa, J. Sahuquillo, A. González, and J. Duato, "Efficient Interconnects for Clustered Microarchitectures," Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, Sept. 2002.
[8] R. Ubal, J. Sahuquillo, S. Petit, and P. López, "Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors," Proc. 19th Int'l Symp. Computer Architecture and High Performance Computing, www.multi2sim.org, Oct. 2007.
[9] C. Lee, M. Potkonjak, and W.H Mangione-Smith, "MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems," Proc. 30th Int'l Symp. Microarchitecture, Dec. 1997.
[10] S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, and N.P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," Proc. 42nd Int'l Symp. Microarchitecture, Dec. 2009.
[11] N. Muralimanohar, R. Balasubramonian, and N.P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," technical report, School of Computing, Univ. of Utah, 2007.
[12] J.M. Parcerisa and A. González, "Reducing Wire Delay Penalty through Value Prediction," Proc. 33rd Int'l Symp. Microarchitecture, Dec. 2000.
[13] A. Baniasadi and A. Moshovos, "Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar Processors," Proc. 33rd Int'l Symp. Microarchitecture, Dec. 2000.
[14] R. Canal, J.M. Parcerisal, and A. González, "Dynamic Cluster Assignment Mechanisms," Proc. Sixth Int'l Symp. High Performance Computer Architecture, Jan. 2000.
[15] D.H. Friendly, S.J. Patel, and Y.N. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors," Proc. 31st Int'l Symp. Microarchitecture, Nov. 1998.
[16] R. Bhargava and L.K. John, "Improving Dynamic Cluster Assignment for Clustered Trace Cache Processors," Proc. 30th Int'l Symp. Computer Architecture, June 2003.
[17] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyler, and P. Roussel, "The Microarchitecture of the Pentium 4 Processor," Intel Technology J., vol. 1, Feb. 2001.
[18] D.H. Friendly, S.J. Patel, and Y.N. Patt, "Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism," Proc. 30th Int'l Symp. Microarchitecture, Dec. 1997.
[19] Q. Jacobson and J.E. Smith, "Instruction Pre-Processing in Trace Processors," Proc. Fifth Int'l Symp. High Performance Computer Architecture, Jan. 1999.
[20] C. Madriles, P. López, J.M. Codina, E. Gibert, F. Latorre, A. Martínez, R. Martínez, and A. González, "Boosting Single-Thread Performance in Multicore Systems through Fine-Grain Multithreading," Proc. 36th Int'l Symp. Computer Architecture, June 2009.
[21] G. Karypis and V. Kumar, "Analysis of Multilevel Graph Partitioning," Proc. ACM/IEEE Seventh Conf. Supercomputing (CDROM), 1995.
64 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool