This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
February 2005 (vol. 16 no. 2)
pp. 130-144

Abstract—Clustering is an effective microarchitectural technique for reducing the impact of wire delays, the complexity, and the power requirements of microprocessors. In this work, we investigate the design of on-chip interconnection networks for clustered superscalar microarchitectures. This new class of interconnects has demands and characteristics different from traditional multiprocessor networks. In particular, in a clustered microarchitecture, a low intercluster communication latency is essential for high performance. We propose some point-to-point cluster interconnects and new improved instruction steering schemes. The results show that these point-to-point interconnects achieve much better performance than bus-based ones, and that the connectivity of the network together with effective steering schemes are key for high performance. We also show that these interconnects can be built with simple hardware and achieve a performance close to that of an idealized contention-free model.

[1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger, “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures,” Proc. 27th Ann. Int'l. Symp. Computer Architecture, pp. 248-259, June 2000.
[2] A. Aggarwal and M. Franklin, “An Empirical Study of the Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors,” Proc. Int'l Symp. Performance Analysis of Systems and Software, pp. 172-179, Nov. 2001.
[3] A. Aggarwal and M. Franklin, “Hierarchical Interconnects for On-chip Clustering,” Proc. Int'l Parallel and Distributed Processing Symp., pp. 63-70, Apr. 2002.
[4] R. Balasubramonian, S. Dwarkadas, and D. Albonesi, “Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors,” Proc. 30th. Ann. Int'l Symp. Computer Architecture, pp. 275-286, June 2003.
[5] A. Baniasadi and A. Moshovos, “Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar Processors,” Proc. 33rd. Int'l Symp. Microarchitecture (MICRO-33), pp. 337-347, Dec. 2000.
[6] M.T. Bohr, “Interconnect Scaling— The Real Limiter to High Performance ULSI,” Proc. 1995 IEEE Int'l Electron Devices Meeting, pp. 241-244, 1995.
[7] D. Burger, T.M. Austin, and S. Bennett, “Evaluating Future Microprocessors: The SimpleScalar Tool Set,” Technical Report CS-TR-96-1308, Univ. of Wisconsin-Madison, 1996.
[8] R. Canal, J.-M. Parcerisa, and A. González, “A Cost-Effective Clustered Architecture,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '99), pp. 160-168, Oct. 1999.
[9] R. Canal, J.-M. Parcerisa, and A. González, “Dynamic Cluster Assignment Mechanisms,” Proc. Sixth. Int'l Symp. High-Performance Computer Architecture, pp. 132-142, Jan. 2000.
[10] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks, An Engineering Approach. Morgan-Kauffman, 2003.
[11] K.I. Farkas, P. Chow, N.P. Jouppi, and Z. Vranesic, “The Multicluster Architecture: Reducing Cycle Time through Partitioning,” Proc. 30th. Int'l Symp. Microarchitecture, pp. 149-159, Dec. 1997.
[12] M. Franklin, “The Multiscalar Architecture,” PhD thesis, Computer Science Dept., Univ. of Wisconsin-Madison, 1993.
[13] L. Gwennap, “Digital 21264 Sets New Standard,” Microprocessor Report, vol. 10, no. 14, Oct. 1996.
[14] R. Ho, K.W. Mai, and M.A. Horowitz, “The Future of Wires,” Proc. IEEE, vol. 89, no. 4, pp. 490-504, Apr. 2001.
[15] G.A. Kemp and M. Franklin, “PEWs: A Decentralized Dynamic Scheduler for ILP Processing,” Proc. Int'l Conf. Parallel Processing, pp. 239-246, Aug. 1996.
[16] K. Krewell, “Intel Embraces Multithreading,” Microprocessor Report, pp. 1-2, Sept. 2001.
[17] The International Technology Roadmap for Semiconductors. Semiconductor Industry Association, 1999.
[18] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. Int'l Symp. Microarchitecture (MICRO-30), pp. 330-335, Dec. 1997.
[19] D. Matzke, “Will Physical Scalability Sabotage Performance Gains?” Computer, vol. 30, no. 9, pp. 37-39, Sept. 1997.
[20] R. Nagarajan, K. Sankaralingam, D. Burger, and S.W. Keckler, “A Design Space Evaluation of Grid Processor Architectures,” Proc. Int'l Symp. Microarchitecture (MICRO-34), pp. 40-51, 2001.
[21] S. Palacharla, N.P. Jouppi, and J.E. Smith, “Complexity-Effective Superscalar Processors,” Proc. 24th. Int'l Symp. Computer Architecture, pp. 206-218, June 1997.
[22] S. Palacharla, “Complexity-Effective Superscalar Processors,” PhD thesis, Univ. of Wisconsin-Madison, 1998.
[23] J.-M. Parcerisa, “Design of Clustered Superscalar Microarchitectures,” PhD thesis, Univ. Politècnica de Catalunya, http://people.ac.upc.es/jmanel/papersparcerisa-phdthesis.pdf , Apr. 2004.
[24] J.-M. Parcerisa and A. González, “Reducing Wire Delay Penalty through Value Prediction,” Proc. 33rd. Int'l Symp. Microarchitecture (MICRO-33), pp. 317-326, Dec. 2000.
[25] J.-M. Parcerisa, A. González, and J.E. Smith, “A Clustered Front-End for Superscalar Processors,” Technical Report #UPC-DAC-2002-29, Computer Architecture Dept., Univ. Politècnica de Catalunya, Spain, July 2002.
[26] J.-M. Parcerisa, J. Sahuquillo, A. González, and J. Duato, “Efficient Interconnects for Clustered Microarchitectures,” Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 291-300, Sept. 2002.
[27] L.-S. Peh and W.J. Dally, “A Delay Model and Speculative Architecture for Pipelined Routers,” Proc. Seventh Int'l Symp. High-Performance Computer Architecture, pp. 255-266, Jan. 2001.
[28] N. Ranganathan and M. Franklin, “An Empirical Study of Decentralized ILP Execution Models,” Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 272-281, Oct. 1998.
[29] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, “Trace Processors,” Proc. 30th. Int'l Symp. Microarchitecture (MICRO-30), pp. 138-148, Dec. 1997.
[30] K. Sankaralingam, V.A. Singh, S.W. Keckler, and D. Burger, “Routed Inter-ALU Networks for ILP Scalability and Performance,” Proc. 21st Int'l Conf. Computer Design, pp. 170-177, Oct. 2003.
[31] M.B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal, “Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 341-353, Feb. 2003.
[32] J.M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, “POWER4 System Microarchitecture,” technical white paper, IBM server group Web site, Oct. 2001.
[33] A. Terechko, E.L. Thenaff, M. Garg, J. van Eijndhoven, and H. Corporaal, “Inter-Cluster Communication Models for Clustered VLIW Processors,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 354-364, Feb. 2003.
[34] M. Tremblay, J. Chan, S. Chaundrhy, A.W. Conigliaro, and S.S. Tse, “The MAJC Architecture: A Synthesis of Parallelism and Scalability,” IEEE Micro, vol. 20, no. 6, pp. 12-25, Nov./Dec. 2000.
[35] J.-Y. Tsai and P.-C. Yew, “The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 35-46, Oct. 1996.
[36] H. Wang, L.-S. Peh, and S. Malik, “Power-Driven Design of Router Microarchitectures in On-Chip Networks,” Proc. 36th Int'l Symp. Microarchitecture (MICRO-36), pp. 105-116, Dec. 2003.
[37] K.C. Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28-41, Apr. 1996.
[38] V. Zyuban, “Inherently Lower-Power High-Performance Superscalar Architectures,” PhD thesis, Univ. of Notre Dame, Jan. 2000.

Index Terms:
Clustered microarchitecture, intercluster communication, on-chip interconnects, instruction steering, complexity.
Citation:
Joan-Manuel Parcerisa, Julio Sahuquillo, Antonio Gonz?lez, Jos? Duato, "On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 2, pp. 130-144, Feb. 2005, doi:10.1109/TPDS.2005.23
Usage of this product signifies your acceptance of the Terms of Use.