This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Design Methodology for Efficient Application-Specific On-Chip Interconnects
February 2006 (vol. 17 no. 2)
pp. 174-190
Wai Hong Ho, IEEE Computer Society

Abstract—As the level of chip integration continues to advance at a fast pace, the desire for efficient interconnects—whether on-chip or off-chip—is rapidly increasing. Traditional interconnects like buses, point-to-point wires, and regular topologies may suffer from poor resource sharing in the time and space domains, leading to high contention or low resource utilization. In this paper, we propose a design methodology for constructing networks for special-purpose computer systems with well-behaved (known) communication characterictics. A temporal and spatial model is proposed to define the sufficient condition for contention-free communication. Based upon this model, a design methodology using a recursive bisection technique is applied to systematically partition a parallel system such that the required number of links and switches is minimized while achieving low contention. Results show that the design methodology can generate more optimized on-chip networks with up to 60 percent fewer resources than meshes or tori while providing blocking performance closer to that of a fully connected crossbar.

[1] “InfiniBand Architecture Standard Version 1.0,” Oct. 2000, www.infinibandta.com.
[2] W. Dally and B. Towles, “Route Packets, Not Wires: On-Chip Interconnection Networks,” Proc. Design Automation Conf., pp. 684-689, June 2001.
[3] W.H. Wolf, “Hardware-Software Codesign of Embedded Systems,” Proc. IEEE, vol. 82, no. 7, pp. 967-989, July 1994.
[4] R.B. Lee, Z. Shi, and X. Yang, “Efficient Permutation Instructions for Fast Software Cryptography,” IEEE Micro, vol. 21, no. 6, pp. 56-69, Nov.-Dec. 2001.
[5] R.P. Dick and N.K. Jha, “MOGAC: A Multiobjective Genetic Algorithm for Hardware-Software Cosynthesis of Distributed Embedded Systems,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 17. no. 10, pp. 920-935, Oct. 1998.
[6] T.M. Pinkston, A. Agarwal, W.J. Dally, J. Duato, B. Horst, and T.B. Smith, “What Will Have the Greatest Impact in 2010: The Processor, the Memory, or the Interconnect?,” Panel Discussion at the Proc. Eighth Int'l Symp. High-Performance Computer Architecture, 2005, http://www.usc.edu/dept/ceng/pinkston/presen tations statistics.html.
[7] M.B. Taylor, J. Kim, J. Miller, F. Ghodrat, B. Greenwald, P. Johnson, W. Lee, A. Ma, N. Shnidman, V. Strumpen, D. Wentzlaff, M. Frank, S. Amarasinghe, and A. Agarwal, “The Raw Processor— A Scalable 32-Bit Fabric for Embedded and General Purpose Computing,” Proc. Hotchips Conf. XIII, Aug. 2001.
[8] J.S. Vetter and F. Mueller, “Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures,” Proc. 16th Int'l Parallel and Distributed Processing Symp., Apr. 2002.
[9] D. Gautier de Lahaut and C. Germain, “Static Communications in Parallel Scientific Programs,” Proc. Conf. Parallel Architectures and Languages Europe '94 LNCS 817, pp. 262-276, July 1994.
[10] S.Q. Moore and L.M. Ni, “The Effects of Network Contention on Processor Allocation Strategies,” Proc. 10th Int'l Parallel Processing Symp., pp. 268-273, Apr. 1996.
[11] J.M. Orduna, F. Silla, and J. Duato, “A New Task Mapping Technique for Communication-Aware Scheduling Strategies,” Proc. 2001 Int'l Conf. Parallel Processing, pp. 349-354, Sept. 2000.
[12] J. Hu and R. Marculescu, “Exploiting the Routing Flexibility for Energy/Performance Aware Mapping of Regular NoC Architectures,” Proc. Design, Automation and Test in Europe Conf. and Exhibition (DATE '03), pp. 688-693, Mar. 2003.
[13] S. Murali and G. De Micheli, “Bandwidth-Constrained Mapping of Cores onto NoC Architectures,” Proc. Design, Automation, and Test in Europe Conf. and Exhibition (DATE '04), vol. 2, pp. 896-901, Feb. 2004.
[14] T.M. Pinkston, R. Pang, and J. Duato, “Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 8, pp. 780-794, Aug. 2003.
[15] J. Rose and S. Brown, “Flexibility of Interconnection Structures for Field-Programmable Gate Arrays,” IEEE J. Solid-State Circuits, vol. 26, no. 3, pp. 277-282, Mar. 1991.
[16] K. Compton and S. Hauck, “Reconfigurable Computing: A Survey of Systems and Software,” ACM Computing Surveys, vol. 34, no. 2, pp. 171-210, June 2002.
[17] M. Raksapatcharawong and T.M. Pinkston, “Design Issues for Core-Based Optoelectronic Chips: A Case Study of the WARRP Network Router,” IEEE J. Special Topics in Quantum Electronics (JSTQE), special issue on smart photonics, vol. 5, no. 2, pp. 330-339, Mar./Apr. 1999.
[18] W.H. Ho and T.M. Pinkston, “A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 377-388, Feb. 2003.
[19] Q.P. Gu and S. Peng, “Wavelengths Requirement for Permutation Routing in All-Optical Multistage Interconnection Networks,” Proc. 14th Int'l Parallel and Distributed Processing Symp., pp. 761-768, May 2000.
[20] D. Hwang and Z. Xu, Scalable Parallel Computing. WCB/McGraw Hill, 1997.
[21] “The NAS Parallel Benchmark,” 2006, http://www.nas.nasa. gov/SoftwareNPB.
[22] J.M Orduna, V. Arnau, and J. Duato, “Characterization of Communications between Processes in Message-Passing Applications,” Proc. IEEE Int'l Conf. Cluster Computing, pp. 91-98, Nov. 2000.
[23] “MPICH: A Portable Implementation of MPI,” http://www-unix.mcs.anl.gov/mpimpich/, 2006.
[24] T. Coremen, C. Leiserson, and R. Rivest, Introduction to Algorithms. McGraw-Hill, 1997.
[25] S. Warnakulasuriya and T.M. Pinkston, “Characterization of Deadlocks in Irregular Networks,” J. Parallel and Distributed Computing, vol. 62, no. 1, pp. 61-84, Jan. 2002.
[26] V.S. Pai, P. Ranganathan, and S.V. Adve, “Rsim Reference Manual. Version 1.0,” Technical Report 9705, Dept. of Electrical and Computer Eng., Rice Univ., July 1997.
[27] R. Nagarajan, K. Sankaralingam, D. Burger, and S.W. Keckler, “A Design Space Evaluation of Grid Processor Architectures,” Proc. 34th Ann. Int'l Symp. Microarchitecture, pp. 40-51, 2001.
[28] S.S. Mukherjee, P. Bannon, S. Lang, A. Spink, and D. Webb, “The Alpha 21364 Network Architecture,” Proc. Symp. High Performance Interconnects (HOT Interconnects 9), pp. 113-117, Aug. 2001.
[29] D.E. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “LogP: A Practical Model of Parallel Computation,” Comm. ACM, vol. 39, no. 11, pp. 78-85, Nov. 1996.
[30] W.H. Ho and T.M. Pinkston, “A Clustering Approach for Identifying and Quantifying Irregularities in Interconnection Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 12, pp. 1222-1239, Dec. 2003.

Index Terms:
On-chip interconnects, communication model, low-contention communication, network partitioning, irregular topology.
Citation:
Wai Hong Ho, Timothy Mark Pinkston, "A Design Methodology for Efficient Application-Specific On-Chip Interconnects," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 2, pp. 174-190, Feb. 2006, doi:10.1109/TPDS.2006.15
Usage of this product signifies your acceptance of the Terms of Use.