This Article 
 Bibliographic References 
 Add to: 
Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors
February 2005 (vol. 16 no. 2)
pp. 99-112

Abstract—As transistor sizes shrink, interconnects represent an increasing bottleneck for chip designers. Several groups are developing new interconnection methods and system architectures to cope with this trend. New architectures require new methods for high-level application mapping and hardware/software codesign. In this paper, we present high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessor systems-on-chip that are streamlined for one or more digital signal processing applications. That is, we seek to synthesize an application-specific interconnect topology. We show that flexible interconnect topologies utilizing low-hop communication between processors offer advantages for reduced power and latency. We show that existing multiprocessor scheduling algorithms can deadlock if the topology graph is not strongly connected, or if a constraint is imposed on the maximum number of hops allowed for communication. We detail an efficient algorithm that can be used in conjunction with existing scheduling algorithms for avoiding this deadlock. We show that it is advantageous to perform application scheduling and interconnect synthesis jointly, and present a probabilistic scheduling/interconnect algorithm that utilizes graph isomorphism to pare the design space.

[1] S. Murali and G.D. Micheli, “Bandwidth-Constrained Mapping of Cores onto NoC Architectures,” Proc. Conf. Design Automation and Test in Europe, pp. 896-901, Feb. 2004.
[2] S. Kumar, A. Jantsch, J.P. Soinen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani, “A Network On Chip Architecture and Design Methodology,” Proc. IEEE Symp. Very Large Scale Integration, pp. 105-112, Apr. 2002.
[3] T. Blickle, J. Teich, and L. Thiele, “System-Level Synthesis Using Evolutionary Algorithms,” Kluwer J. Design Automation for Embedded Systems, vol. 3, pp. 23-62, 1998.
[4] R.P. Dick and N.K. Jha, “MOGAC: A Multiobjective Genetic Algorithm for Hardware-Software Cosynthesis of Distributed Embedded Systems,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 17, no. 10, pp. 920-935, Oct. 1998.
[5] S. Sriram and S.S. Bhattacharyya, Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker Inc., 2000.
[6] J. Hu and R. Marculescu, “Exploiting the Routing Flexibility for Energy/Performance Aware Mapping of Regular NoC Architectures,” Proc. Conf. Design, Automation, and Test in Europe, Mar. 2003.
[7] W.H. Ho and T.M. Pinkston, “A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 377-388, Feb. 2003.
[8] C. Ebeling, D. Cronquist, and P. Franklin, “RaPiD: Reconfigurable Pipelined Datapath,” Proc. Sixth Int'l Workshop Field-Programmable Logic and Applications, pp. 126-135, 1996.
[9] A. Singh, A. Mukherjee, and M. Marek-Sadowska, “Interconnect Pipelining in a Throughput-Intensive FPGA Architecture,” Proc. ACM/SIGDA Ninth Int'l Symp. Field-Programmable Gate Arrays, pp. 153-160, 2001.
[10] A. Sharma, C. Ebeling, and S. Hauck, “Piperoute: A Pipelining-Aware Router for FPGAs,” Proc. 11th ACM/SIGDA Int'l Symp. Field-Programmable Gate Arrays, pp. 68-77, 2003.
[11] S. Hauck, G. Borriello, and C. Ebeling, “Mesh Routing Topologies for Multi-FPGA Systems,” IEEE Trans. Very Large Scale Integration Systems, vol. 6, no. 3, pp. 400-408, Sept. 1998.
[12] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1991.
[13] J. Hwang, Y. Chow, F. Anger, and C. Lee, “Scheduling Precedence Graphs in Systems with Inter-Processor Communication Times,” SIAM J. Computing, vol. 18, no. 2, pp. 244-257, Apr. 1989.
[14] H. El-Rewini, T. Lewis, and H. Ali, Task Scheduling in Parallel and Distributed Systems. Englewood Cliffs, N.J.: Prentice Hall, 1994.
[15] Y.-K. Kwok and I. Ahmad, “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs onto Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 506-521, May 1996.
[16] A. Gerasoulis and T. Yang, “A Comparison of Clustering Heuristics for Scheduling DAGs on Multiprocessors,” J. Parallel and Distributed Computing, vol. 16, no. 4, pp. 276-291, Dec. 1992.
[17] Y. Zhang, A. Sivasubramaniam, J. Moreira, and H. Franke, “Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 9, pp. 967-985, Sept. 2001.
[18] I. Ahmad and Y.-K. Kwok, “On Exploiting Task Duplication in Parallel Program Scheduling,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 9, pp. 872-892, Sept. 1998.
[19] J. Colin and P. Chretienne, “C.P.M. Scheduling with Small Computation Delays and Task Duplication,” Operations Research, pp. 680-684, 1991.
[20] S. Darbha and D. Agrawal, “Optimal Scheduling Algorithm for Distributed Memory Machines,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 1, pp. 87-95, Jan. 1998.
[21] G.C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 75-87, Feb. 1993.
[22] Y.-K. Kwok and I. Ahmad, “Link Contention-Constrained Scheduling and Mapping of Tasks and Messages to a Network of Heterogeneous Processors,” Cluster Computing, vol. 3, no. 2, pp. 113-124, Sept. 2000.
[23] D.A. Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips,” Proc. IEEE, vol. 88, no. 6, pp. 728-749, June 2000.
[24] K. Day and A. Al-Ayyoub, “Topological Properties of OTIS-Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 4, Apr. 2002.
[25] A. Kahn, C. McCreary, J. Thompson, and M. McArdle, “A Comparison of Multiprocessor Scheduling Heuristics,” Proc. 1994 Int'l Conf. Parallel Processing, vol. 2, pp. 243-250, 1994.
[26] G.C. Sih, “Multiprocessor Scheduling to Account for Interprocessor Communication,” PhD dissertation, Dept. of Electrical Eng. and Computer Science, Univ. of California at Berkeley, Apr. 1991.
[27] P. Marwedel and G. Goosens, Code Generation for Embedded Processors. Kluwer Academic Publishers, 1995.
[28] B. McKay, “Nauty User's Guide,” Technical Report TR-CS-90-02, Australian Nat'l Univ., 1990.
[29] S. Skiena, Graph Isomorphism, Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Reading, Mass.: Addison-Wesley, 1990.
[30] T. Miyazaki, “The Complexity of McKay's Canonical Labeling Algorithm,” Groups and Computation II, vol. 28, pp. 239-256, 1997.

Index Terms:
Embedded multiprocessors, interconnect synthesis, scheduling, task graphs.
Neal K. Bambha, Shuvra S. Bhattacharyya, "Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 2, pp. 99-112, Feb. 2005, doi:10.1109/TPDS.2005.20
Usage of this product signifies your acceptance of the Terms of Use.