The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.23)
pp: 2245-2253
Peng Zhang , Stony Brook University, Stony Brook
Yuefan Deng , Stony Brook University, Stony Brook
ABSTRACT
Broadcast algorithms for the interlaced bypass torus networks (iBT networks) are introduced to balance the all-port bandwidth efficiency and to avoid congestion in multidimensional cases. With these algorithms, we numerically analyze the dependencies of the broadcast efficiencies on various packet-sending patterns, bypass schemes, network sizes, and dimensionalities and then strategically tune up the configurations for minimizing the broadcast steps. Leveraging on such analysis, we compare the performance of networks with one million nodes between two cases: one with an added fixed-length bypass links and the other with an added torus dimension. A case study of {\rm iBT}( {1000^2 ;{\schmi b} =\langle 8,32\rangle}) and {\rm Torus}( {100^3 }) shows that the former improves the diameter, average node-to-node distance, rectangular and global broadcasts over the latter by approximately 80 percent. It is reaffirmed that strategically interlacing short bypass links and methodically utilizing these links is superior to adding dimensionalities to torus in achieving shorter diameter, average node-to-node distances and faster broadcasts.
INDEX TERMS
Algorithm design and analysis, Parallel programming, Bandwidth, Broadcasting, Multiprocessor interconnection, interlaced bypass torus, Parallel computing, collective communication, broadcast, network performance
CITATION
Peng Zhang, Yuefan Deng, "Design and Analysis of Pipelined Broadcast Algorithms for the All-Port Interlaced Bypass Torus Networks", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 12, pp. 2245-2253, Dec. 2012, doi:10.1109/TPDS.2012.93
REFERENCES
[1] TOP500. Top 500 Supercomputer Site, http:/www.top500.org, 2012.
[2] J. Dongarra et al., "High-Performance Computing: Clusters, Constellations, MPPs, and Future Directions," Computing in Science and Eng., vol. 7, pp. 51-59, 2005.
[3] A. Hoisie et al., "A Performance Comparison through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple," Proc. ACM/IEEE SC Conf. Supercomputing (SC '06), p. 3, 2006.
[4] N.R. Adiga et al., "Blue Gene/L Torus Interconnection Network," IBM J. Research and Development, vol. 49, Mar./May 2005.
[5] G. Almasi et al., "Design and Implementation of Message-Passing Services for the Blue Gene/L Supercomputer," IBM J. Research and Development, vol. 49, pp. 393-406, 2005.
[6] N.R. Adiga et al., "An Overview of the BlueGene/L Supercomputer," Proc. ACM/IEEE Conf. Supercomputing, p. 60, 2002.
[7] S.L. Scott and G.M. Thorson, "The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus," Proc. HOT Interconnect Symp. IV, Aug. 1996.
[8] E. Chan et al., "Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links," Proc. 11th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2006.
[9] W. Liu et al., "Portable and Scalable Algorithm for Irregular All-to-All Communication," J. Parallel and Distributed Computing, vol. 62, pp. 1493-1526, 2002.
[10] J. Duato et al., Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers Inc., 2002.
[11] W. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003.
[12] Petascale Computing: Algorithms and Applications. Chapman & Hall/CRC Computational Science, 2008.
[13] P. Zhang, R. Powell, and Y. Deng, "Interlacing Bypass Rings to Torus Networks for More Efficient Networks," IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 2, pp. 287-295, Feb. 2011.
[14] A.R. Mamidala et al., "MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics," Proc. IEEE Eighth Int'l Symp. Cluster Computing and the Grid, pp. 130-137, 2008.
[15] G. Almasi et al., "Optimization of MPI Collective Communication on BlueGene/L Systems," Proc. 19th Ann. Int'l Conf. Supercomputing, 2005.
[16] A. Faraj and X. Yuan, "Automatic Generation and Tuning of MPI Collective Communication Routines," Proc. 19th Ann. Int'l Conf. Supercomputing, 2005.
[17] R. Thakur et al., "Optimization of Collective Communication Operations in MPICH," Int'l J. High Performance Computing Applications, vol. 19, pp. 49-66, Feb. 2005.
[18] E.W. Chan et al., "On Optimizing Collective Communication," Proc. IEEE Int'l Conf. Cluster Computing, 2004.
[19] A. Faraj et al., "MPI Collective Communications on the Blue Gene/P Supercomputer: Algorithms and Optimizations," Proc. IEEE 17th Symp. High Performance Interconnects, pp. 63-72, Aug. 2009.
[20] D.B. Gannon and J.V. Rosendale, "On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms," IEEE Trans. Computers, vol. C-33, no. 12, pp. 1180-1194, Dec. 1984.
[21] R. Rabenseifner, "Automatic Profiling of MPI Applications with Hardware Performance Counters," Proc. Sixth European Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 22-22, 1999.
[22] J.J. Dongarra et al., "The LINPACK Benchmark: Past, Present and Future," Concurrency and Computation: Practice and Experience, vol. 15, pp. 803-820, 2003.
[23] M. Eleftheriou et al., "Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer," Proc. 11th Int'l Euro-Par Conf. Parallel Processing, pp. 795-803, 2005.
[24] P. Mitra et al., "Fast Collective Communication Libraries, Please," Proc. Intel Supercomputing Users' Group Meeting, 1995.
[25] J. Pješivac-Grbović et al., "Performance Analysis of MPI Collective Operations," Cluster Computing, vol. 10, pp. 127-143, 2007.
[26] S.S. Vadhiyar et al., "Automatically Tuned Collective Communications," Proc. ACM/IEEE Conf. Supercomputing (CDROM), 2000.
[27] R. Thakur and W. Gropp, "Improving the Performance of Collective Operations in MPICH," Proc. 10th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 257-267, 2003.
[28] L.P. Huse, "Collective Communication on Dedicated Clusters of Workstations," Proc. sixth European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999.
[29] T. Kielmann et al., "Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems," Proc. 14th Int'l Symp. Parallel and Distributed Processing, 2000.
[30] Y. Yang and J. Wang, "Near-Optimal All-to-All Broadcast in Multidimensional All-Port Meshes and Tori," IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 2, pp. 128-141, Feb. 2002.
[31] A. Faraj et al., "STAR-MPI: Self Tuned Adaptive Routines for MPI Collective Operations," Proc. 20th Ann. Int'l Conf. Supercomputing, 2006.
[32] R. Thakur and A.N. Choudhary, "All-to-All Communication on Meshes with Wormhole Routing," Proc. Eighth Int'l Symp. Parallel Processing, 1994.
[33] C. Calvin et al., "All-to-All Broadcast in Torus with Wormhole-Like Routing," Proc. IEEE Symp. Parallel and Distributed Processing, pp. 130-137, 1995.
[34] H.H.J. Ben, "Gossiping on Meshes and Tori," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 6, pp. 513-525, June 1998.
[35] U. Meyer and J.F. Sibeyn, "Time-Independent Gossiping on Full-Port Tori," Max-Planck Institut für Informatik, Sept. 1998.
[36] M. Barnett et al., "Broadcasting on Meshes with Wormhole Routing," J. Parallel and Distributed Computing, vol. 35, pp. 111-122, 1996.
[37] S. Kumar et al., "Optimization of All-to-All Communication on the Blue Gene/L Supercomputer," Proc. 37th Int'l Conf. Parallel Processing, pp. 320-329, 2008.
[38] Y. Yang and J. Wang, "Pipelined All-to-All Broadcast in All-Port Meshes and Tori," IEEE Trans. Computers, vol. 50, no. 10, pp. 1020-1032, Oct. 2001.
[39] G.E. Fagg et al., "ACCT: Automatic Collective Communications Tuning," Proc. Seventh European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000.
[40] I.-H. Chung et al., "MPI Performance Analysis Tools on Blue Gene/L," Proc. ACM/IEEE Conf. Supercomputing (SC '06), p. 16, 2006.
[41] IBM Blue Gene Team, "Overview of the Blue Gene/P project," IBM J. Research and Development, vol. 2, pp. 199-220, 2008.
[42] V. Puente et al., "The Adaptive Bubble Router," J. Parallel and Distributed Computing, vol. 61, pp. 1180-1208, 2001.
[43] V. Puente et al., "Adaptive Bubble Router: A Design to Improve Performance in Torus Networks," Proc. Int'l Conf. Parallel Processing, 1999.
[44] M. Barnett et al., "Global Combine Algorithms for 2-D Meshes with Wormhole Routing," J. Parallel and Distributed Computing, vol. 24, pp. 191-201, 1995.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool