The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.32)
pp: 8-20
Yutong Lu , National University of Defense Technology, China
Kefei Wang , National University of Defense Technology, China
Min Xie , National University of Defense Technology, China
Hongjia Cao , National University of Defense Technology, China
Xuejun Yang , National University of Defense Technology, China
ABSTRACT
<p>The petascale supercomputer Tianhe-1A, which features hybrid multicore CPU and GPU computing, achieves an optimized balance of computation and communication capabilities through a proprietary high-bandwidth, low-latency interconnect fabric. The authors' message-passing service, based on scalable user-level communication and offloaded operations for large-scale, low-latency collective communication, has achieved a unidirectional bandwidth of 6,340 Mbytes/s.</p>
INDEX TERMS
MPI, collective communication, user-level communication, TianHe-1A, supercomputer, CPU, GPU
CITATION
Yutong Lu, Kefei Wang, Min Xie, Hongjia Cao, Xuejun Yang, "Tianhe-1A Interconnect and Message-Passing Services", IEEE Micro, vol.32, no. 1, pp. 8-20, January/February 2012, doi:10.1109/MM.2011.97
REFERENCES
1. X.J. Yang et al., "The TianHe-1A Supercomputer: Its Hardware and Software," J. Computer Science and Technology, vol. 26, no. 3, 2011, pp. 344-351.
2. S. Scott et al., "The BlackWidow High-Radix Clos Network," Proc. 33rd Int'l Symp. Computer Architecture (ISCA 06), IEEE CS Press, 2006, pp. 16-28.
3. B.N. Chun, A.M. Mainwaring, and D.E. Culler, "Virtual Network Transport Protocols for Myrinet," IEEE Micro, vol. 18, no. 1, 1998, pp. 53-63.
4. R. Bhoedjang, T. Ruhl, and H.E. Bal, "Design Issues for User-Level Network Interface Protocols on Myrinet," Computer, vol. 31, no. 11, 1998, pp. 53-60.
5. J. Beecroft et al., "QsNetII: Defining High-Performance Network Design," IEEE Micro, vol. 25, no. 4, 2005, pp. 34-47.
6. I. Schoinas and M.D. Hill, "Address Translation Mechanisms in Network Interfaces," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA 98), IEEE CS Press, 1998, pp. 219-230.
7. T. Hoefler, T. Schneider, and A. Lumsdaine, "Characterizing the Influence of System Noise on Large-Scale Applications by Simulation," Proc. ACM/IEEE Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC 10), IEEE CS Press, 2010, doi: 10.1109/SC.2010.12.
8. K. Underwood et al., "Enabling Flexible Collective Communication Offload with Triggered Operations," Proc. 19th IEEE Ann. Symp. High-Performance Interconnects (HOTI 11), IEEE CS Press, 2011, pp. 35-42.
9. D. Hengsen, R. Finkel, and U. Manber, "Two Algorithms for Barrier Synchronization," Int'l J. Parallel Programming, vol. 17, no. 1, 1988, pp. 1-17.
10. M. Lauria, S. Pakin, and A.A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x," Proc. 7th IEEE High-Performance Distributed Computing Conf. (HPDC 7), IEEE CS Press, 1998, pp. 10-20.
11. J. Liu and D.K. Panda, "Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand," Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS 04), IEEE CS Press, 2004, doi: 10.1109/IPDPS.2004.1303193.
12. J. Liu, J. Wu, and D.K. Panda, "High Performance RDMA-Based MPI Implementation over InfiniBand," Int'l J. Parallel Programming, vol. 32, no. 3, 2004, pp. 167-198.
13. H. Tezuka et al., "Pin-Down Cache: A Virtual Memory Management Technique for Zero-Copy Communication," Proc. 1st Merged Int'l Parallel Processing Symp. & Symp. Parallel and Distributed Processing (IPPS/SPDP 98), IEEE CS Press, 1998, pp. 308-315.
14. "MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE"; http:/mvapich.cse.ohio-state.edu/.
15. J.S. Vetter and F. Mueller, "Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures," J. Parallel and Distributed Computing, vol. 63, no. 9, 2003, pp. 853-865.
12 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool