The Community for Technology Leaders
RSS Icon
Issue No.03 - March (2009 vol.20)
pp: 331-345
Shuyi Shao , University of Pittsburgh, Pittsburgh
Alex K. Jones , University of Pittsburgh, Pittsburgh
Rami Melhem , University of Pittsburgh, Pittsburgh
In this paper we explore compiler techniques for achieving efficient communications on circuit switching interconnection networks. We propose a compilation framework for identifying communication patterns and compiling these patterns as network configuration directives. This has the potential of providing significant performance benefits when connections can be established in the network prior to the actual communications. The framework includes a flexible and powerful communication pattern representation scheme that captures the property of communication patterns and allows manipulation of these patterns. In this way, communication phases can be identified within the application. Additionally, we extend the classification of static and dynamic communications to include persistent communications. Persistent communications are a subclass of dynamic communications that remain unchanged for large segments of the application execution. An experimental compiler has been developed to implement the framework. This compiler is capable of detecting both static and persistent communications within an application. We show that for the NAS Parallel Benchmarks, 100% of the point-to-point communications can be classified as either static or persistent and 100% of the collectives are either static or persistent with the exception of IS. Simulation-based performance analysis demonstrates the benefit of using our compiler techniques for achieving efficient communications in multiprocessor systems.
Circuit-switching networks, Compilers
Shuyi Shao, Alex K. Jones, Rami Melhem, "Compiler Techniques for Efficient Communications in Circuit Switched Networks for Multiprocessor Systems", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 3, pp. 331-345, March 2009, doi:10.1109/TPDS.2008.82
[1] G. Broomell and J.R. Heath, “Classification Categories and Historical Development of Circuit Switching Topologies,” ACM Computing Surveys, vol. 15, no. 2, pp. 95-133, 1983.
[2] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Margan Kaufmann, 2003.
[3] F. Cappello and C. Germain, “Toward High Communication Performance through Compiled Communications on a Circuit Switched Interconnection Network,” Proc. Int'l Symp. High Performance Computer Architecture (HPCA '95), pp. 44-53, 1995.
[4] X. Yuan, R. Melhem, and R. Gupta, “Compiled Communication for All-Optical TDM Networks,” Proc. Supercomputing (SC), 1996.
[5] T. Gross, “Communication in iWarp Systems,” Proc. Supercomputing (SC '89), pp. 436-445, 1989.
[6] K.J. Barker, A. Benner, R. Hoare, A. Hoisie, A.K. Jones, D.J. Kerbyson, D. Li, R. Melhem, R. Rajamony, E. Schenfeld, S. Shao, C. Stunkel, and P.A. Walker, “On the Feasibility of Optical Circuit Switching for High Performance Computing Systems,” Proc. Supercomputing (SC), 2005.
[7] J. Shalf, S. Kamil, L. Oliker, and D. Skinner, “Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect,” Proc. Supercomputing (SC), 2005.
[8] A. Faraj and X. Yuan, “Communication Characteristics in the NAS Parallel Benchmarks,” Proc. Parallel and Distributed Computing and Systems Conf. (PDCS), 2002.
[9] J. Vetter and F. Mueller, “Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures,” J. Parallel and Distributed Computing, vol. 63, no. 9, pp. 853-865, Sept. 2003.
[10] N.R. Adiga et al., “An Overview of the Bluegene/L Supercomputer,” Proc. Supercomputing (SC), 2002.
[11] S.L. Scott, “Synchronization and Communication in the T3e Multiprocessor,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1996.
[12] S. Habata, K. Umezawa, M. Yokokawa, and S. Kitawaki, “Hardware System of the Earth Simulator,” Parallel Computing, vol. 30, no. 12, pp. 1287-1313, 2004.
[13] V. Gupta and E. Schenfeld, “Combining Message Switching with Circuit Switching in the Interconnection Cached Multiprocessor Network,” Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms and Networks (ISPAN), 1994.
[14] V. Gupta and E. Schenfeld, “Task Graph Partitioning and Mapping in a Reconfigurable Parallel Architecture,” Parallel Processing Letters, vol. 5, no. 4, pp. 563-574, 1995.
[15] D. Chiarulli, S. Levitan, R. Melhem, J. Taza, and G. Gravenstreter, “Partitioned Optical Passive Star (Pops) Multiprocessor Interconnection Networks with Distributed Control,” IEEE J. Lightwave Technology, vol. 14, no. 7, pp. 1601-1612, 1996.
[16] G. Gravenstreter and R. Melhem, “Realizing Common Communication Patterns in Partitioned Optical Passive Stars (Pops) Networks,” IEEE Trans. Computers, vol. 47, no. 9, pp. 998-1013, Sept. 1998.
[17] P. Dowd et al., “Lightning Network and System Architecture,” J. Lightwave Technology, vol. 14, pp. 1371-1387, 1996.
[18] A.K. Kodi and A. Louri, “Rapid: Reconfigurable and Scalable All-Photonic Interconnect for Distributed Shared Memory Multiprocessors,” IEEE/OSA J. Lightwave Technology, vol. 22, no. 9, pp. 2101-2110, 2004.
[19] A.K. Kodi and A. Louri, “Design of a High-Speed Optical Interconnect for Scalable Shared Memory Multiprocessors,” IEEE Micro, vol. 25, no. 1, pp. 41-49, 2005.
[20] A.K. Kodi and A. Louri, “A New Technique for Dynamic Bandwidth Re-Allocation in Optically Interconnected High-Performance Computing Systems,” Proc. IEEE Symp. High-Performance Interconnects (HOTI), 2006.
[21] P.B. Chu et al., “Design and Nonlinear Servo Control of MEMS Mirrors and Their Performance in a Large Port-Count Optical Switch,” J. Microelectromechanical Systems, vol. 14, no. 2, pp. 261-273, Apr. 2005.
[22] T. Yamamoto, J. Yamaguch, R. Sawada, and Y. Uenishi, “Development of a Large-Scale 3D MEMS Optical Switch Module,” NTT Technical Rev., vol. 1, no. 7, pp. 37-42, Oct. 2003.
[23] Z. Ding, R. Hoare, A. Jones, D. Li, S. Shao, S. Tung, J. Zheng, and R. Melhem, “Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2005.
[24] S. Shao, A.K. Jones, and R. Melhem, “A Compiler-Based Communication Analysis Approach for Multiprocessor Systems,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2006.
[25] D. Shires, L. Pollock, and S. Sprenkle, “Program Flow Graph Construction for Static Analysis of MPI Programs,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications (PDPTA '99), June 1999.
[26] S.-Y. Ho and N.-W. Lin, “Static Analysis of Communication Structures in Parallel Programs,” Proc. Int'l Computer Symp. (ICS '02), pp. 215-221, 2002.
[27] MPI: A Message-Passing Interface Standard, Message Passing Interface Forum, June 1995.
[28] D. Bailey, T. Harris, W. Sahpir, and R. van der Wijingaart, “The NAS Parallel Benchmarks 2.0,” Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Technical Report NAS-95-020, Dec. 1995.
[29] H.G. Dietz and T. Mattox, “Compiler Techniques for Flat Neighborhood Networks,” Proc. 13th Int'l Workshop Languages and Compilers for Parallel Computing (LCPC), 2000.
[30] J. Liang, A. Laffely, S. Srinivasan, and R. Tessier, “An Architecture and Compiler for Scalable On-Chip Communication,” IEEE Trans. Very Large Scale Integration Systems, vol. 12, no. 4, pp. 711-726, July 2004.
[31] D. Lahaut and C. Germain, “Static Communcations in Parallel Scientific Programs,” Proc. Int'l Parallel Architectures and Languages Europe Conf. (PARLE), 1994.
[32] R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, “Architectural Requirements of Parallel Scientific Applications with Explicit Communication,” ACM Computer Architecture News, vol. 21, no. 2, pp. 2-13, May 1993.
[33] R.R. Hoare, Z. Ding, and A.K. Jones, “Level-Wise Scheduling Algorithm for Fat Tree Interconnection Networks,” Proc. Supercomputing (SC), 2006.
[34] V. Delaluz, M. Kandemir, N. Vijakrishnan, A. Sivasubramaniam, and M.J. Irwin, “Dram Energy Management Using Software and Hardware Directed Power Mode Control,” Proc. IEEE Int'l Symp. High-Performance Computer Architecture (HPCA '01), pp. 159-169, 2001.
[35] R.P. Wilson, R.S. French, C.S. Wilson, S.P. Amarsinghe, J.M. Anderson, S.W.K. Tjiang, S.W. Liao, C.W. Tseng, M.W. Hall, M.S. Lam, and J.L. Hennessy, “SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers,” ACM SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, Dec. 1994.
[36] P. Pavlo, G. Vahala, and L. Vahala, “Higher Order Isotropic Velocity Grids in Lattice Methods,” Physics Rev. Letters, vol. 80, p. 3960, 1998.
[37] A. MacNab, G. Vahala, P. Pavlo, L. Vahala, and M. Soe, “Lattice Boltzmann Model for Dissipative Incompressible MHD,” Proc. 28th EPS Conf. Controlled Fusion and Plasma Physics (FUSION '01), vol. 25A, pp. 853-856, June 2001.
[38] J. Kim, W.J. Dally, B. Towles, and A.K. Gupta, “Microarchitecture of a High Radix Router,” Proc. Int'l Symp. Computer Architecture (ISCA '05), pp. 420-431, 2005.
[39] C.B. Stunkel, J. Herring, B. Abali, and R. Sivaram, “A New Switch Chip for IBM Rs/6000 SP Systems,” Proc. Supercomputing (SC), 1999.
[40] T. Hoefler, P. Kambadur, R.L. Graham, G. Shipman, and A. Lumsdaine, A Case for Standard Non-Blocking Collective Operations, LNCS 4757, Springer, pp. 125-134, 2007.
[41] L. Gharai, C. Perkins, and T. Lehman, “Packet Reordering, High Speed Networks and Transport Protocol Performance,” Proc. IEEE Int'l Conf. Computer Comm. and Networks (ICCCN '04), pp. 73-78, 2004.
[42] P. Balaji, W. Feng, S. Bhagvat, D.K. Panda, R. Thakur, and W. Gropp, “Analyzing the Impact of Supporting Out-of-Order Communication on In-Order Performance with IWARP,” Proc. Supercomputing (SC), 2007.
[43] LLNL, The ASCI COMOPS Benchmark Code. Lawerence Livermore National Laboratory Website, http:/, 2008.
[44] J. Kim and D.J. Lilja, “Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs,” Proc. Second Int'l Workshop Network-Based Parallel Computing: Communication, Architecture, and Applications, G. Goos, J. Hartmanis, and J. Leeuwen, eds., pp. 202-216, 1998.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool