This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Simple Data Transfer Technique Using Local Address for Networks-on-Chips
December 2006 (vol. 17 no. 12)
pp. 1425-1437

Abstract—Networks-on-chips (NoCs) have been studied to connect a number of modules in a chip by introducing a network structure which is similar to that in parallel computers. Since embedded streaming applications usually generate predictable small-sized data traffic, the network structure can be customized to the target traffic. Accordingly, we develop a data transfer technique for simplifying routers for predictable small-sized communication in simple tile-based architectures. A data structure is split into single-flit packets, and a label is attached to each of them in order to route them independently. A label is transferred on dedicated wires beside data lines in a channel by taking advantage of relaxed pin count limitations of a channel. To reduce the wiring area for the label, the label is locally assigned according to a preanalysis of required communication pairs of nodes. Analysis results show that only a 3-bit local label is sufficient to route all data of evaluated streaming applications in the case of a 16-node 2D torus. The required amount of hardware for a router is reduced by 37 percent compared with that for a wormhole packet router with the same number of routing table entries.

[1] H. Amano, S. Abe, K. Deguchi, and Y. Hasegawa, “An I/O Mechanism on a Dynamically Recongurable Processor—Which Should be Moved: Data or Conguration?” Proc. Conf. Field-Programmable Logic and Applications (FPL), pp. 347-352, Aug. 2005.
[2] V. Baumgarte, G. Ehlers, F. May, A. Nuckel, M. Vorbach, and M. Weinhardt, “PACT XPP—A Self-Recongurable Data Processing Architecture,” Supercomputing, vol. 26, no. 2, pp. 167-184, Sept. 2003.
[3] W. Dally and C. Seitz, “Deadlock-Free Message Routing in Multiprocessor Interconnection Networks,” IEEE Trans. Computers, vol. 36, no. 5, pp. 547-553, May 1987.
[4] W. Dally and B. Towles, “Route Packets, Not Wires: On-Chip Interconnection Networks,” Proc. 38th Design Automation Conf., pp. 684-689, June 2001.
[5] D. Bailey, T. Harris, W. Saphir, R. Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NAS, Dec. 1995.
[6] D. Burger, S.W. Keckler, K. McKinley, M. Dahlin, L. John, C. Lin, C. Moore, J. Burrill, R. McDonald, W. Yoder, and the TRIPS Team, “Scaling to the End of Silicon with EDGE Architectures,” Computer, vol. 37, no. 7, pp. 44-55, July 2004.
[7] F. Furtek, E. Hogenauer, and J. Scheuermann, “Interconnecting Heterogeneous Nodes in an Adaptive Computing Machine,” Proc. Field-Programmable Logic and Applications (FPL), pp. 125-135, Sept. 2004.
[8] C.J. Glass and L.M. Ni, “The Turn Model for Adaptive Routing,” Proc. Int'l Symp. Computer Architecture, pp. 278-287, 1992.
[9] H. Amano, A. Jouraku, and K. Anjo, “A Dynamically Adaptive Switching Fabric on a Multicontext Recongurable Device,” Proc. Conf. Field-Programmable Logic and Applications (FPL), pp. 161-170, Sept. 2003.
[10] IPFlex Inc., DAPDNA: Digital Application Processor/Distributed Network Architecture, http:/www.ipex.com/, 2006.
[11] I.T. Assoc., Infiniband Architecture, specication volume 1, release 1.0.a, http:/www.infinibandta.com, June 2001.
[12] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2002.
[13] J. Liang, A. Laffely, S. Srinivasan, and R. Tessier, “An Architecture and Compiler for Scalable On-Chip Communication,” IEEE Trans. Very-Large-Scale Integration (VLSI), vol. 12, pp. 711-726, July 2004.
[14] K. Anjo, Y. Yamada, M. Koibuchi, A. Jouraku, and H. Amano, “BLACK-BUS: A New Data Transfer Technique Using Local Address on Networks-on-Chips,” Proc. IEEE Int'l Parallel and Distributed Processing Symp., p. 10a, Apr. 2004.
[15] S. Kumar, A. Jantsch, J. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani, “A Network on Chip Architecture and Design Methodology,” Proc. Symp. Very-Large-Scale Integration (ISVLSI '02), pp. 117-124, 2002.
[16] L. Benini and G. Micheli, “Networks on Chips: A New SoC Paradigm,” Computer, vol. 35, no. 1, pp. 70-78, 2002.
[17] M. Koibuchi, K. Watanabe, T. Otsuka, and H. Amano, “Performance Evaluation of Deterministic Routings, Multicasts, and Topologies on RHiNET-2 Cluster,” IEEE Trans. Parallel and Distributed Systems, vol. 16, pp. 747-759, Aug. 2005.
[18] T. Marescaux, A. Bartic, D. Verkest, S. Vernalde, and R. Lauwereins, “Interconnection Networks Enable Fine-Grain Dynamic Multi-Tasking on FPGAs,” Proc. Conf. Field-Programmable Logic and Applications (FPL), pp. 795-805, Sept. 2002.
[19] P. Master, “The Age of Adaptive Computing Is Here,” Proc. Conf. Field-Programmable Logic and Applications, p. 1, Sept. 2002.
[20] M. Koibuchi, J.C. Martinez, J. Flich, A. Robles, P. Lopez, and J. Duato, “Enforcing In-Order Packet Delivery in System Area Networks with Adaptive Routing,” J. Parallel and Distributed Computing, vol. 65, pp. 1223-1236, Oct. 2005.
[21] M. Motomura, “A Dynamically Reconfigurable Processor Architecture,” Proc. Microprocessor Forum, Oct. 2002.
[22] M. Suzuki, Y. Hasegawa, Y. Yamada, N. Kaneko, K. Deguchi, H. Amano, K. Anjo, M. Motomura, K. Wakabayashi, T. Toi, and T. Awashima, “Stream Applications on the Dynamically Reconfigurable Processor,” Proc. Int'l Conf. Field Programmable Technology (FPT), pp. 137-144, Dec. 2004.
[23] P. Bhojwani and R. Mahapatra, “Interfacing Cores with On-Chip Packet-Switched Networks,” Proc. IEEE Int'l Conf. Very-Large-Scale Integration Design, pp. 382-387, Jan. 2003.
[24] P. Guerrier and A. Greiner, “A Generic Architecture for On-Chip Packet-Switched Interconnections,” Proc. Design, Automation, and Test in Europe Conf. (DATE '00), pp. 250-256, Mar. 2000.
[25] picoChip Designs Ltd., picoArray, http://www.picochip.com/technologypicoarray /, 2005.
[26] T. Pinkston, R. Pang, and J. Duato, “Deadlock-Free Dynamic Reconguration Schemes for Increased Network Dependability,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 8, pp. 780-794, Aug. 2003.
[27] STI, Cell Broadband Engine Documentation, www.ibm.com/developerworks/powercell, http:/cell.scei.co.jp, Aug. 2005.
[28] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “A High-Performance Portable Implementation of the MPI Message Passing Interface Standard,” Parallel Computing, vol. 22, no. 6, pp. 789-828, Sept. 1996.
[29] W.H. Ho and T.M. Pinkston, “A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns,” Proc. Ninth Int'l Symp. High-Performance Computer Architecture, pp. 377-388, Feb. 2003.
[30] W.J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2003.
[31] Y. Yamada, H. Amano, M. Koibuchi, A. Jouraku, K. Anjo, and K. Nishimura, “Folded Fat H-Tree: An Interconnection Topology for Dynamic Reconfigurable Processor Array,” Proc. Int'l Conf. Embedded and Ubiquitous Computing, pp. 301-311, Aug. 2004.

Index Terms:
Networks-on-chips, on-chip interconnects, table-lookup routing, streaming processing, reconfigurable systems, interconnection networks.
Citation:
Michihiro Koibuchi, Kenichiro Anjo, Yutaka Yamada, Akiya Jouraku, Hideharu Amano, "A Simple Data Transfer Technique Using Local Address for Networks-on-Chips," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 12, pp. 1425-1437, Dec. 2006, doi:10.1109/TPDS.2006.166
Usage of this product signifies your acceptance of the Terms of Use.